[PDF] Deep Reformulated Laplacian Tone Mapping

Abstract

Wide dynamic range (WDR) images contain more scene details and contrast when compared to common images. However, it requires tone mapping to process the pixel values in order to display properly. The details of WDR images can diminish during the tone mapping process. In this work, we address the problem by combining a novel reformulated Laplacian pyramid and deep learning. The reformulated Laplacian pyramid always decompose a WDR image into two frequency bands where the low-frequency band is global feature-oriented, and the high-frequency band is local feature-oriented. The reformulation preserves the local features in its original resolution and condenses the global features into a low-resolution image. The generated frequency bands are reconstructed and fine-tuned to output the final tone mapped image that can display on the screen with minimum detail and contrast loss. The experimental results demonstrate that the proposed method outperforms state-of-the-art WDR image tone mapping methods. The code is made publicly available at this https URL

Full PDF

11 Deep Reformulated Laplacian Tone Mapping

Jie Yang , Ziyi Liu , Mengchen Lin , Svetlana Yanushkevich , and Orly Yadid-Pecht I2Sense lab, University of Calgary, Calgary T2N 1N4, Canada Westlake University, Hangzhou 310024, China Biometric Technologies Laboratory, Schulich School of Engineering, University of Calgary, Calgary T2N 1N4,Canada

Abstract —Wide dynamic range (WDR) images contain morescene details and contrast when compared to common images.However, it requires tone mapping to process the pixel valuesin order to display properly. The details of WDR images candiminish during the tone mapping process. In this work, weaddress the problem by combining a novel reformulated Lapla-cian pyramid and deep learning. The reformulated Laplacianpyramid always decompose a WDR image into two frequencybands where the low-frequency band is global feature-oriented,and the high frequency band is local feature-oriented. Thereformulation preserves the local features in its original resolutionand condenses the global features into a low-resolution image.The generated frequency bands are reconstructed and ﬁne-tunedto output the ﬁnal tone mapped image that can display on thescreen with minimum detail and contrast loss. The experimen-tal results demonstrate that the proposed method outperformsstate-of-the-art WDR image tone mapping methods. The codeis made publicly available at https://github.com/linmc86/Deep-Reformulated-Laplacian-Tone-Mapping.

Index Terms —Tone mapping, wide dynamic range image,image processing, machine learning.

I. I

NTRODUCTION

Wide dynamic range (WDR) imaging plays an importantrole in many imaging-related applications including photogra-phy, machine vision, medical imaging, and self-driving cars.Unlike traditional images that may suffer from under- andover-exposure, WDR images are obtained with WDR sen-sors with huge dynamic range [1], [2] or radiance recoveryalgorithm such as [3] that take multiple exposure images tocompensate the under and over exposure regions. WDR imagesgreatly avoid the detail and contrast loss issues of conventionallow dynamic range (LDR) images that often affect the humanvisual experience. However, unlike most conventional LDRimages where the pixel values range from to , the rangeof the pixels of WDR images are distributed over a much widerrange. They may range from to , , or × − to . based on the way that they are acquired. Although WDRdisplay devices do exist in the commercial market for directWDR display, they are still far away from representing allavailable luminance levels. In fact, the absolute majority ofdisplays are, and in the foreseeable future will most likely be,of a very limited dynamic range. Therefore, to show WDRimages on commonly used displays, additional tone mappingis still needed to convert the WDR images to a standard Corresponding author e-mail: [email protected] displayable level. To avoid any misunderstanding, we call thedisplayable image that is tone mapped from WDR as WDR-LDR to distinguish from conventional LDR images.Previous methods of tone mapping employ various gradientreduction methods to compress the dynamic range [4]–[8].Unfortunately, their WDR-LDR output often inevitably losessome details and contrast that are preferred by the humanvisual system (HVS). This is because tone mapping is notonly a gradient reduction problem, but rather an in-depthtopic involving human perception. A good WDR tone mappingalgorithm could not only compress the large gradient butalso enhance the local details of WDR images. In this paper,we propose to directly learn the global compression andlocal detail manipulation functionalities between the WDRimages and the WDR-LDR images. Our method takes a WDRimage as input, and tone maps it to WDR-LDR automatically,compressing the global dynamic range while enhancing localdetails. This work has the following key contributions. First,we present the reformulated Laplacian method to decomposethe global feature and local features from the original WDRimage. The reformulated Laplacian method condenses theglobal features into a low-resolution image which facilitateglobal feature extraction during convolution operations. Sec-ondly, we present a two-stage network architecture and a fullend-to-end learning approach which can directly tone mapWDR images to WDR-LDR images. The entire network is ajoint of three sub-networks that focuses on global compression,local manipulation, and ﬁne tuning, respectively. The threesub-networks work cooperatively to produce the ﬁnal WDR-LDR image. Code and model are available on our project page.II. R

ELATED W ORK

In this section, we discuss works relevant to our research.The works include image-to-image transformation, conven-tional approaches that tone map WDR image to WDR-LDRimage and reverse tone mapping that reconstructs a WDRimage from an LDR image.

Image-to-image transformation

Generally speaking, WDRtone mapping is an image-to-image transformation task. Inrecent years, many image-to-image transformation tasks aretackled by training deep convolutional neural networks. Forexample, deep neural networks for denoising, colorization,semantic segmentation, and super-resolution applications aremassively proposed and show great performance improvement a r X i v : . [ ee ss . I V ] J a n when compared with traditional methods [9]–[12]. Style trans-fer methods that adopt perceptual loss can produce certainartistic rendered counterpart of an input image and preservethe original image content [13], [14]. Perceptual loss measuresthe high-level image feature representations extracted frompre-trained convolutional neural networks. These features aremore sensitive to HVS than simple pixel values. In-networkencoder-decoder architectures are also widely used in imagetransformation works where the original image is encoded intoa low dimensional latent representation and then decoded toreconstruct the required image [11], [15]–[17]. LDR to WDR

The most well-known approach to generatea WDR image is merge multiple LDR photographs that weretaken with different exposures [3]. It is still widely usedin many applications. To remove ghost artifacts caused bymisalignment between images of different exposures, manyeffective techniques were proposed [18]–[20] including CNN-based solution [21]. Unlike WDR radiance recovery that fusesall available information of a bracketed of images, reverse tonemapping simply generates the missing information from a sin-gle LDR image. In recent years, with the growing popularity ofmachine learning and abundant WDR sources, traditional re-verse tone mapping methods [22]–[25] were overperformed bymachine learning based approaches. Endo et al. [26] proposeda convolutional neural network architecture that is able tocreate a serial of bracketed images of different exposures froma single LDR image. A WDR image is then generated fromthese bracketed images. Eilertsen et al. [27] proposed encoder-decoder architecture that is able to reconstruct WDR imagefrom arbitrary single exposed LDR image with unknowncamera response functions and post-processing. Marnerides etal. [28] proposed ExpandNet which consists of three differentbranches to reconstruct the missed information of an LDRimage. A generative adversarial network is also proposed tocarry out reverse tone mapping which could generate imageswith wider dynamic range [29].

WDR to WDR-LDR

The research of tone mapping a WDRimage to WDR-LDR image has been lasting for decades. Thesimplest approach to tone mapping a WDR image is to use aglobal tone mapping operator (TMO). Global TMO applies asingle global function to all pixels in the image where identicalpixels will be given an identical output value within the rangeof the display device.Tumblin and Rushmeier [30] and Ward[31] were the early researchers who developed global operatorsfor tone mapping.Recently, Khan et al. [32] proposed a global TMO thatuses a sensitivity model of the human visual system. Ingeneral, global TMOs are easy to implement and mostlyartifacts-free, and they have unique advantages in hardwareimplementations. However, the tone mapped images mostlysuffer from low brightness, low contrast and loss of detailsdue to the global compression of the dynamic range. Differentfrom global TMOs, local TMOs are able to tone map a pixelbased on local statistics and reveal more detail and contrast.Some early local TMOs were inspired by certain kind offeatures of human visual system [6], [33], [34]. Some localTMOs solve the WDR image compression as a constrainedoptimization problem [35], [36]. In recent years, various edge preserving ﬁlters based TMOs were developed [5], [7], [8],[37], and showed unprecedented results when compared withthe aforementioned methods. A comprehensive review andclassiﬁcation of tone mapping algorithms can be found in[38]. Recently, a machine learning method that can effectivelycalculate the coefﬁcients of a locally-afﬁne model in bilateralspace was reported [39]. It shows the great potential andperformance that machine learning can provide for WDR tonemapping. III. A

PPROACH

To train a learning-based TMO to learn the mapping froma WDR image to a WDR-LDR image, we originally designour CNN model to be 10 layers ﬂat with skip connectionarchitecture shown in Figure 1. We used a combination ofthe well-known (cid:96) -norm loss, Structual dissimilarity (DSSIM)[40] loss, and feature loss [13] to train our network. The (cid:96) -norm can be formulated as: (cid:96) = || f ( x ; θ ) − y || (1)where f ( . ) represents the CNN network that takes the inputimage x and the weight θ . y is the ground truth.The DSSIM loss is a variation of the Structual Similarityindex (SSIM) [41] that reﬂects the distance between twoimages. The DSSIM can be formulated as: DSSIM = 1 − ssim ( f ( x ; θ ) − y ) (2)Feature loss, (cid:96) feat ( x, y ) as a part of the perceptual loss, wasproposed by [13]. It uses 16-layer VGG network pre-trainedon ImageNet to measure the semantic differences between twoimages. Unlike the (cid:96) -norm pushes the output image to exactlymatch the label in each pixel, l feat ( x, y ) encourages them toincrease the similarities in different feature levels. Suppose φ i ( x ) is the output of the feature loss network at i -th activationlayer, and the activation map is a shape of W i × H i × C i . Weadopted 5 convolutional layers of VGG-16. The feature lossfunction is formulated as: (cid:96) feat ( x g , y g ) = (cid:88) i =1 W i H i C i || ( φ i ( x g ) − φ i ( y g ) || (3)After many experimental attempts, the result directly gen-erated by such CNN with skip connection architecture showsunpleasing. The tone mapped images exhibit in severe contrastloss and color distortion. Figure 2 shows two examples of thetone mapped results comparing to our novel TMO that will beintroduced in the next section. These cases demonstrate thatCNN, in general, can be used to compress the gradient of aWDR image. However, CNN with this architecture lacks theability to generate a WDR-LDR image with a smooth textureand high contrast. It also fails to preserve the details in theoverexposed regions such as the scenery outside the windowin (a). In addition, lots of halo artifacts can also be visuallyobserved in high gradient areas. This is likely because CNNwith ordinary architecture has difﬁculty extracting the high-frequency feature of a WDR image. Frequency means therate of change of intensity per pixel. If you have an area Fig. 1: An overview of the 10 layers CNN architecture. (a) (b)(c) (d)

Fig. 2: The tone mapped results of the CNN with ﬂat architecture (a and b) and reformulated Laplacian pyramid architecture(c and d). Images (a) and (c), and the images (b) and (d) exhibit the same scene, respectively.in your image changing from black to white, which takesmany pixels to represent that intensity variation, it is calledlow-frequency, and vice versa. For that reason, we came upwith the idea of redesigning our CNN to operate on thedifferent image frequencies. One network can focus on thegradient compression in the high-frequency layer, while theother network focuses on the compression of the naturalness.In the end, the result of the two image frequency bands willbe reconstructed back to generate the tone mapped WDR-LDRimage. We combine our CNN with the reformulated Laplacianpyramid to complete this task.Figure 3 presents an overview of the novel architecture. Theobjective of our work is to ﬁnd the weight θ , which tone mapthe input image x to an output image ˆ x , i.e. ˆ x = f ( x ; θ ) .The input WDR image x is ﬁrst decomposed into n differentfrequency bands x , x , ..., x n − , x g with Laplacian pyramiddecomposition where x is the highest frequency band and x g is the lowest frequency band. The high frequency bands from x to x n − are further Laplacian reconstructed to a singleimage x l which has the original resolution of the WDR image.The entire network f is composed of three sub-networks,global compression network f g , local manipulation network f l , and ﬁne tune network f t . f g is used to generate the low fre-quency Laplacian decomposition of ˆ x g , i.e. ˆ x g = f g ( x g ; θ g ) .Network f l is used to generate the high frequency componentsof ˆ x l , ˆ x l = f l ( x l ; θ l ) . Network f g handles global featureswhile network f l deals with the high frequency local features.The generated images of ˆ x l and ˆ x g are reconstructed and ﬁnetoned through network f t to output the ﬁnal WDR-LDR image ˆ x . A. Laplacian Pyramid Reformulation

Laplacian pyramid condenses the global luminance infor-mation of an image to lower resolution without sacriﬁcing thedetail since the traditional Laplacian pyramid reconstructionoperation will nevertheless restore the image back. On theother hand, applying convolutional operation over the imagewith lower resolution can effectively decrease the computa-tional complexity, thus reduces the requirement of the com-puting device.A WDR image can be segmented into n different frequencybands with a Laplacian pyramid. The lowest frequency bandcontains the global luminance terrain of the original image Fig. 3: An overview of the proposed deep multi-bands tone mapping architecture. It decompose an input WDR image intomultiple frequency bands with Laplacian pyramid, every band is mapped to the WDR-LDR domain with a speciﬁc deep neuralnetwork.and the higher frequencies contain local detail and texturalinformation which varies fast in space. The advantage of usingLaplacian decomposition is apparent.1) Taking the lowest frequency layer x g as an example, itsresolution is reduced by n − times in both width andheight when compared to the original image, moreover,the global luminance terrain is well preserved in x g .2) In subsequent processing, even a small kernel in theneural network can process a large receptive ﬁeld ofthe original image.3) Additionally, the low-resolution input of x g can signiﬁ-cantly reduce the required computation for training theparameter θ g . Figure 4 shows the visual comparison withdifferent choices of the number of the frequency band n .However, the generated Laplacian pyramid also has certaindrawbacks. Firstly, there are n layers of images and eachcontains different frequency components of the original WDRimage. It would be difﬁcult to process all different layers witha single neural network because the low frequency layer needsto be compressed greatly while the high frequency layers onlyneed to be manipulated locally. Furthermore, if the n layersare processed with n different neural networks, the complexityof tone mapping model will also grow and make it not feasibleto ﬁt in hardware devices.To overcome the mentioned drawbacks caused by theLaplacian pyramid, we reconstruct an image from the entireLaplacian pyramid without the lowest frequency layer. Thegenerated x l is a single image that has the same resolutionas the original WDR image. Now, the Laplacian pyramid isreformulated into two layers with x l representing all highfrequency components and x g representing low frequencyglobal luminance terrain. Figure 5 intuitively shows the rela-tionship and the difference between the reformulated Laplacianpyramid and the original one. Using two layers in the Lapla-cian pyramid structure has the following advantages. First, it reduces the original Laplacian pyramid model from n layersto layers, hence the computation complexity of subsequentprocessing is signiﬁcantly reduced. Secondly, the segmentationof high and low frequency components of the WDR imageleads network f g and f l to focus on simple tasks, namely,global compression and local detail manipulation, respectively. B. Global Compression Network

The global compression network of f g focuses on thecompression of the global dynamic range of the WDR im-age, namely x g . After the decomposition of the Laplacianpyramid, x g is a low resolution image and only containsglobal luminance information of the original WDR image.Unlike many image transformation works [29], [42], [43] thatemploy encode-decode architecture to avoid the loss of theglobal feature during convolution, our architecture is able toachieve the same effect with the help of the low resolutionrepresentation x g . A small k × k kernel is able to cover (2 n − ∗ k ) pixels of the original WDR image if the WDRimage is decomposed to n layers. Therefore, we adopt a simpleCNN architecture to do the compression. The detail of theproposed global compression network is summarized Table I. W and H are the width and height of the input image x g ,respectively. Given an input image x , and the ground truthWDR-LDR image y , we use the (cid:96) -norm, feature loss (cid:96) feat and (cid:96) regularization as the loss function: (cid:96) global = α(cid:96) + β(cid:96) feat ( x g , y g ) + γR ( θ g ) (4)where α = 0.5, β = 0.5 and γ = 0.2. The (cid:96) -norm can beformulated as: (cid:96) = || f g ( x g ; θ g ) − y g || (5)where y g is the lower frequency part of the correspondingreformulated Laplacian pyramid of ground truth y . As ourdataset is not ample comparing to all WDR image represen-tations, we implement (cid:96) regularization loss R ( θ g ) to all ourneural networks to prevent over-ﬁtting. Fig. 4: Visual comparison of the resulting images in different frequency bands. (a) is the ground truth image. (b), (c), (d) and(e) are the images with the frequency band n = 2 , , and , respectively.Fig. 5: Illustration of the relationship and the differences between reformulated Laplacian pyramid and the original Laplacianpyramid. Unlike the original pyramid shape image decomposition (top-left, n = 5 ), the reformulated structure (top-right, n = 5 ) always contains two layers. Layers at the same frequency are connected with red dots. The high frequency layer ofthe reformulated pyramid in each n can be reconstructed by adding the high frequency layer and its upsampled previous highfrequency layer in the original Laplacian pyramid. C. Local Manipulation Network

The purpose of the local manipulation network f l is tomanipulate the high frequency part of the WDR image, namely x l . Unlike x g , the high frequency features contained in x l aremostly local. For simplicity, we adopt the same architecturein Table I to do the local manipulation. This is because x l has the same resolution as the WDR image, the kernels inTable I will only cover a local image patch instead of a globalarea. The same network can serve two different goals whencooperated with x g and x l , respectively. The learning objectiveof the local manipulation network and the global compressionnetwork are the same. We use the same set of parameters andloss function: (cid:96) local = α(cid:96) + β(cid:96) feat ( x l , y l ) + γR ( θ l ) (6) y l is the high frequency part of the corresponding reformu-lated Laplacian pyramid of the ground truth y . TABLE I: Detail of Global Compression Network and LocalManipulation Network. Layers Input Size Kernel Size Stride Kernel Num.Input W × H - - -Conv 1 W × H × Batch norm 1 W × H × - - -Conv 2 W × H ×

32 3 × Batch norm 2 W × H × - - -Conv 3 W × H ×

32 3 × Batch norm 3 W × H × - - -Conv 4 W × H ×

32 3 × Batch norm 4 W × H × - - -Conv1 × W × H × - - Output W × H Input + Conv1 × D. Fine Tune Network

The global compression network and the local manipulationnetwork are able to generate the corresponding reformulated

Laplacian layer ˆ x g and ˆ x l , respectively. The Laplacian pyramidrequires additional operations to add all frequency layers.That is, ˆ x t = upsampling( ˆ x g ) + ˆ x l . However, image ˆ x cannotguarantee overall visual quality since ˆ x g and ˆ x l are producedwith separate neural networks. Moreover, color shifts, regionalblurry and other artifacts may also occur in ˆ x .To overcome these possible issues, we utilize a ﬁne tunenetwork f t to further reﬁne the reconstructed image ˆ x t to thedesired ground truth image. f t is a ResNet architecture withlarge feature maps and small depth since the main featureof the image has been learned from the previous two neuralnetworks.The

ResNet contains 4 residual blocks. Each residual blockconsists of 2 convolutional layers. We use × × kernels forevery layer with stride of 1, and we use a batch normalizationlayer after each convolutional layer. At the end of this ResNet ,a × convolution layer is applied to condense all extractedfeatures from the 32-channel receptive ﬁeld to 1. The lossfunction of ﬁne tune network is slightly different than theprevious network. We adopted feature loss l feat and (cid:96) -norm: (cid:96) t = α t (cid:96) + β t (cid:96) feat (ˆ x t , y ) (7)where α t = 0.6 and β t = 0.4.IV. E XPERIMENTS

In this section, we ﬁrst present the experimental setup andthen analyze the effects of the proposed Laplacian pyramidreformation. We then compare the proposed model with thestate-of-the-art methods on two databases.

A. Training Data Generation

We trained the proposed network for WDR tone mappingon Laval indoor dataset [44]. This dataset contained 2,233high-resolution ( × ), high dynamic range indoorpanoramas WDR images captured by Canon 5D Mark IIIcamera. In the Laval indoor dataset, some images containwatermarks of different scale in the bottom region. We discardthe bottom 15% of the panoramas to remove watermarks onthe original images. After this cropping, the image resolutionbecame × . And the total number of images usedin the experiment was 2,125. These images are further down-sampled to one-quarter of its original resolution and trans-ferred to luminance image. The luminance image is generatedand recovered using methods described in [8]. We generate 20sub-images from each training sample. The size of the sub-images are drawn uniformly from the range [20%, 60%] ofthe size of an input WDR image and re-sampled to × pixels The ground truth WDR-LDR images are generatedusing various tools including Luminance HDR , HDR toolbox provided by [45], Photoshop with human tuning andsupervision. https://github.com/LuminanceHDR/LuminanceHDR B. Ground Truth Generation

Similar to Cai’s [49] reference image generation method,we generated high-quality ground truth images using severalTMOs and human tuning. We used 6 TMOs in this processincluding Fattal [4], Ferradans [46], Mantiuk [35], Drago [50],Durand [5], and Reinhard [51] from Luminance HDR andHDR tool box. Then we employed 4 volunteers and 2 pho-tographers in this process. The two photographers ﬁrst pickedout the images they thought were unsatisfactory (such as toodark, too bright, or exists distortion), and used Photoshop toﬁx them according to their own preferences. The volunteersperformed the random pairwise comparison independently inthe 7 sets of tone mapped images by given instruction: • Select one image of two that best suits your visualpreferences. • Spend no more than 5 sec for each pair.Images with the same vote or that couldn’t be selectedwithin 5 seconds will be circulated back to photographers tomodify, and then send to volunteers in the next round until allimages have been selected.

C. Implementation Details

We randomly selected 70% images for training our modeland use the remaining 10% for validation and 20% for testing.The network parameters are initialized using the truncatednormal initializer. All training experiments are performedusing the TensorFlow deep learning library. We adopt theADAM optimizer for loss minimization with the learning rateis − , momentum β = 0.9 and β = 0.999, (cid:15) = 10 − .We use mini-batch gradient descent with batch size 8 forlocal manipulation, 64 for global compression and 4 for ﬁnetuning. The forgoing networks are trained in multiple steps.The network f l and f g are trained ﬁrst. And then, we use theloss function of f t to jointly train the entire system containing f l , f g and f t . The proposed model is trained in an end-to-endfashion. D. Parameter Setting

The process of Laplacian pyramid reformulation describedin Section III-A has a hidden parameter n which indicates thenumber of layers during the original Laplacian decomposition.A different n value will certainly affect the training and leadto different results. In order to evaluate the effect of thisparameter on the ﬁnal trained model, we trained our modelwith n = 2 , , ..., and evaluated the average PSNR, SSIM[41] and FSITM [52] on the test data set. The result issummarized in Table II. It is not surprising that the median n values achieve average higher metrics. Actually, a smaller n value will assign most information to the x g image while alarger n value will move more frequency bands to x l . Suppose n is so large that x g has only one pixel, then the ﬁnal imagewill solely be determined by f l . On the other hand, if n is too small, then x l will contain limited information whichdeteriorates the desired functionality of f l . The model with n = 6 gives the highest metric values. In the rest of thispaper, we set n = 6 for all remaining experiments. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatix [48] (h) Proposed Fig. 6: Visual comparison on the test set. The proposed method can effectively compress the global dynamic range whilepreserving local detail and contrast. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed(i) Reference (j) Mantiuk [35] (k) Paris [7] (l) Ferradans [46](m) Mai [47] (n) Gu [8] (o) Photomatrix [48] (p) Proposed

Fig. 7: Visual comparison on the test set. The proposed method is able to enhance and recover local details that cannot beseen with other algorithms.TABLE II: Average PSNR, SSIM and FSITM values computedfor models with different n values. Median n values are ableto achieve relatively higher indices when compared with thetwo end values. Parameter PSNR (dB) SSIM FSITM n = 2 n = 3 n = 4 n = 5 n = 6 n = 7 E. Running Time

We report the processing time of each algorithm in TableIII. We evaluated all methods on a PC with Intel(R) Core(TM) TABLE III: Comparison of the average running time on animage with size × . [7] requires more than one hourof processing time. Methods Time(s)[8] 2.4[35] 1.7[46] 8.2[47] 0.7[48] 3.7Ours 0.61(with GPU) i5-8600 CPU 3.10GHz, 16G memory. We used one HDRimage of size × as input. Note that learning-based TM solutions are designed under GPU environmentby convention since they run often signiﬁcantly slower than (a) Gamma correction (b) Mantuik [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatix [48] (h) Proposed(i) Gamma correction (j) Mantiuk [35] (k) Paris [7] (l) Ferradans [46](m) Mai [47] (n) Gu [8] (o) Photomatrix [48] (p) Proposed Fig. 8: Visual comparison on the Fairchild database. The proposed model renders more detail in saturated regions with lessartifacts when compared with other state-of-the-art approaches.other TM approaches under CPU environment. Our model runs24.95 seconds with CPU and 0.61 seconds with GPU (NvidiaTitan Xp).

F. Comparison With State-of-the-art Methods

We compare the proposed model with other 6 state-of-the-art image tone mapping methods, namely, Gu TMO [8],Mantiuk TMO [35], Paris TMO [7], Ferradans TMO [46],Mai TMO [47], and Photomatix TMO [48]. Among them, GuTMO [8] is based on edge preserving ﬁlter theory, imagestone mapped with this TMO usually exhibit more detail. ParisTMO [7] is based on local Laplacian operator and it is goodat preserving details from the WDR image. Mantiuk TMO[35] regards tone mapping as an optimization problem, thoughit has some difﬁculty in preserving detail, it can give verynatural looking image results. Ferradans TMO [46] and MaiTMO [47] are popular tone mapping methods and are used inopen source applications. Photomatix is a commercial softwarededicated to WDR image tone mapping.The following results of the mentioned algorithms areobtained from their online websites or open source projectswith default parameter settings. We evaluate the model on thetest dataset ﬁrst and then on a totally different database.

1) Objective Quality Assessment:

We ﬁrst compared ouralgorithm with these methods on images from the test set ofthe Laval database. The results on one image are shown inFigure 6. All methods are able to produce acceptable images.However, they also have different problems.For example, images obtained with Mantiuk TMO, ParisTMO, Ferradans TMO, and Photomatix TMO are generallydarker than other images which give them a disadvantagefor screening and human observing. Mai TMO generates thebrightest image but it also saturates the area within the redrectangle. Both Gu TMO and the proposed method are ableto generate images that are similar to the reference image.However, the image from the proposed TMO looks morenatural than the image of Gu TMO because of the globalluminance distribution. Moreover, it can display more localdetail in the ﬂoor when compared with the Gu TMO.Actually, the proposed model is able to extract local detaileven under some extreme conditions. Two examples are givenin Figure 7. The two images show a commonly seen WDRscenario where there is extreme luminance difference betweeninside and outside window area. The proposed model can stilldisplay the scenes outside the window more clearly than otherresults including the reference images.

TABLE IV: PSNR, SSIM, FSITM and HDR-VDP2 indicesfor different tone mapping methods. The values are obtainedusing the test data set.

Methods PSNR SSIM FSITM HDR-VDP2Gu [8] 16.5024 0.7755 0.830 35.165Mantiuk [35] 14.8641 0.7563

TABLE V: Average TMQI, BTMQI, FSITM and HDR-VDP2indices for different tone mapping methods. The values areobtained using the Fairchild database.

Methods TMQI BTMQI FSITM HDR-VDP2Gu [8] 0.8300 3.6683 0.823 26.161Mantiuk [35] 0.9194 3.7474

PSNR, SSIM, FSITM and HDR-VDP2 [53] are employedto assess these algorithms quantitatively. FSITM is designed toevaluate the feature similarity index for tone-mapped images.We measured the quality of the images using HDR-VDP2, thevisual metric that mimics the anatomy of the HVS to evaluatethe quality of HDR images. The average indices obtained fromthe test set of Laval database is summarized in Table IV. Inall four metrics, the proposed algorithm is able to achievethe highest scores for PSNR, SSIM and HDR-VDP2. For theFSITM, the algorithm achieves the second highest score.To further demonstrate the robustness of our method, weevaluate our method on images outside the test set. Wechoose Fairchild database [54] which contains 105 WDRimages containing various situations. It is a commonly usedbenchmark for measuring tone mapping methods. Two resultimages are shown in Figure 8. The ﬁrst image has a verywide dynamic range in front of and behind the lamp. MantiukTMO, Ferrandans TMO, Mai TMO, and the proposed modelsare able to generate images while other algorithms showvarious color artifacts in the top left dim region. In the fourimages without artifacts, the proposed model is able to showthe color boards clearly under both dark and bright lightingconditions. In the second image, only Gu TMO and theproposed model are able to show clearly the shape of the sun.Images obtained with other methods cannot show this detail.Since there are no reference images for Fairchild database,we use the blind quality indexes alone with TMQI [55] toquantitatively measure the performance of different methods.Blind quality assessment of tone mapped images (BTMQI)[56] are two blind metrics that do not require a referenceimage to compute a metric score. The computed indices aresummarized in Table V. The proposed model is able to achievethe highest TMQI and BTMQI scores.

2) Subjective Preference Assessment:

We used human-preferred tuning to generate our ground truth images. In this TABLE VI: The summary of the subjective experimentalresults.

Sum indicates the total number of each TMO beingselected in the

Comparative Selection

Section.

Ave selection represents the average number of each TMO being selectedin the

Comparative Selection

Section.

Ave rating representsthe average rating score of each TMO in the

Image QualityRating

Section. The computational and statistical details areelaborated in the Supplementary material.

Sum Ave selection

Ave rating

Ferradans [46] 69 8.625 6.184Gu [8] 111 13.875 6.403Mai [47] 65 8.125 6.571Mantiuk [35] 54 6.75 6.079Paris [7] 52 6.5 5.586Photomatrix [48] 39 4.875 5.642Ours

112 14 6.580 process, no metrics were employed because no golden standardmetrics exist to evaluate the quality of a tone mapped imagenor do the metrics truly reﬂect the observers’ preference. Thesemetrics can only be used as limited references. To assess theactual visual experience of our tone mapped LDR images, wealso carried out a subjective preference experiment beyondthe set of objective indices measurement to assess the visualquality of the image generated from our deep learning solution.The subjective experiment has two sections:

ComparativeSelection

Section and

Image Quality Rating

Section. Eachsection contains 8 groups of tone mapped LDR images. Inthe Comparative Selection Section, each group contains theLDR images generated by all 7 algorithms from the samescene for visual comparison. Participants were asked to selectone image of their visual preference. In the Image QualityRating Section, participants were asked to rate each LDRimage from to based on the degree of visual comfort andthe degree of details revealed in the image. Details refer to thebrightness, contrast, and the extent to which overexposure andunderexposure details are revealed. represents “dislike” or“fuzzy details” and stands for “most favorite” or “clear andrich details”. is somewhat in the middle. All images wererandomly selected from Laval HDR dataset.We used SurveyHero website to build our subjective ex-periment project. We send the survey invitation randomly viaemail, social networking websites, and application (see ap-pendix for the experiment details). The survey result is shownin Table VI. In terms of visual experience that meant to betested in the Comparative Selection Section, the voting resultsof our algorithm and Gu’s algorithm are signiﬁcantly betterthan other approaches. The good visual experience comes fromthe low frequency layer that our Global Compression Networkcan effectively learn from the global luminance terrain ofthe human-tuned ground truth image. In the Image QualityRating Section, the Local Manipulation Network in our modelcan extract and enhance the local high frequency details,therefore avoiding the lack of details in the overexposed andunderexposed areas of the image after tone mapping. Ourresult achieves the highest voting score in both sections. V. C

ONCLUSION

In this work, we have proposed a new tone mapping methodthat can perform high-resolution WDR image tone mapping.To preserve the global low frequency feature as well asmaintain local high frequency detail, we have proposed anovel reformulated Laplacian method to decompose a WDRimage into a low-resolution image which contains the lowfrequency component of the WDR image and a high-resolutionimage which contains the remaining higher frequencies of theWDR image. The two images are processed by a dedicatedglobal compression network and a local manipulation neuralnetwork, respectively. The global compression network learnshow to compress the global scale gradient of a WDR imageand the local manipulation network learns to manipulate localfeatures. The generated images from the two networks arefurther merged together to produce the ﬁnal output image forscreen display. We visually and quantitatively compared ourmodel with other state-of-the-art tone mapping methods withimages from and outside the targeted database. The resultsshowed that the proposed method outperforms other methods,and sometimes even shows better results than the ground truth.R

EFERENCES[1] O. Yadid-Pecht and E. R. Fossum, “Wide intrascene dynamic rangecmos aps using dual sampling,”

IEEE Transactions on Electron Devices ,vol. 44, no. 10, pp. 1721–1723, 1997.[2] A. Spivak, A. Belenky, A. Fish, and O. Yadid-Pecht, “Wide-dynamic-range cmos image sensors—comparative performance analysis,”

IEEEtransactions on electron devices , vol. 56, no. 11, pp. 2446–2461, 2009.[3] P. E. Debevec and J. Malik, “Recovering high dynamic range radiancemaps from photographs,” in

Proceedings of the 24th annual conferenceon Computer graphics and interactive techniques . ACM Press/Addison-Wesley Publishing Co., 1997, pp. 369–378.[4] R. Fattal, D. Lischinski, and M. Werman, “Gradient domain highdynamic range compression,” in

ACM transactions on graphics (TOG) ,vol. 21, no. 3. ACM, 2002, pp. 249–256.[5] F. Durand and J. Dorsey, “Fast bilateral ﬁltering for the display ofhigh-dynamic-range images,” in

ACM transactions on graphics (TOG) ,vol. 21, no. 3. ACM, 2002, pp. 257–266.[6] E. Reinhard and K. Devlin, “Dynamic range reduction inspired by pho-toreceptor physiology,”

IEEE Transactions on Visualization & ComputerGraphics , no. 1, pp. 13–24, 2005.[7] S. Paris, S. W. Hasinoff, and J. Kautz, “Local laplacian ﬁlters: edge-aware image processing with a laplacian pyramid,”

Communications ofthe ACM , vol. 58, no. 3, pp. 81–91, 2015.[8] B. Gu, W. Li, M. Zhu, and M. Wang, “Local edge-preserving multiscaledecomposition for high dynamic range image tone mapping,”

IEEETransactions on image Processing , vol. 22, no. 1, pp. 70–79, 2013.[9] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussiandenoiser: Residual learning of deep cnn for image denoising,”

IEEETransactions on Image Processing , vol. 26, no. 7, pp. 3142–3155, 2017.[10] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution usingdeep convolutional networks,”

IEEE transactions on pattern analysisand machine intelligence , vol. 38, no. 2, pp. 295–307, 2016.[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in

International Conference onMedical image computing and computer-assisted intervention . Springer,2015, pp. 234–241.[12] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in

European Conference on Computer Vision . Springer, 2016, pp. 649–666.[13] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in

European Conference onComputer Vision . Springer, 2016, pp. 694–711.[14] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer usingconvolutional neural networks,” in

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , 2016, pp. 2414–2423. [15] H. Noh, S. Hong, and B. Han, “Learning deconvolution networkfor semantic segmentation,” in

Proceedings of the IEEE internationalconference on computer vision , 2015, pp. 1520–1528.[16] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in

Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2015, pp. 3431–3440.[17] Y.-H. Tsai, X. Shen, Z. Lin, K. Sunkavalli, X. Lu, and M.-H. Yang,“Deep image harmonization,” in

IEEE Conference on Computer Visionand Pattern Recognition (CVPR) , vol. 2, 2017.[18] Y. S. Heo, K. M. Lee, S. U. Lee, Y. Moon, and J. Cha, “Ghost-freehigh dynamic range imaging,” in

Asian Conference on Computer Vision .Springer, 2010, pp. 486–500.[19] C. Lee, Y. Li, and V. Monga, “Ghost-free high dynamic range imagingvia rank minimization,”

IEEE Signal Processing Letters , vol. 21, no. 9,pp. 1045–1049, 2014.[20] P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman,and E. Shechtman, “Robust patch-based hdr reconstruction of dynamicscenes.”

ACM Trans. Graph. , vol. 31, no. 6, pp. 203–1, 2012.[21] S. Wu, J. Xu, Y.-W. Tai, and C.-K. Tang, “Deep high dynamic rangeimaging with large foreground motions,” in

European Conference onComputer Vision . Springer, 2018, pp. 120–135.[22] F. Banterle, P. Ledda, K. Debattista, and A. Chalmers, “Inverse tonemapping,” in

Proceedings of the 4th international conference on Com-puter graphics and interactive techniques in Australasia and SoutheastAsia . ACM, 2006, pp. 349–356.[23] A. G. Rempel, M. Trentacoste, H. Seetzen, H. D. Young, W. Heidrich,L. Whitehead, and G. Ward, “Ldr2hdr: on-the-ﬂy reverse tone mappingof legacy video and photographs,” in

ACM transactions on graphics(TOG) , vol. 26, no. 3. ACM, 2007, p. 39.[24] R. P. Kovaleski and M. M. Oliveira, “High-quality brightness en-hancement functions for real-time reverse tone mapping,”

The VisualComputer , vol. 25, no. 5-7, pp. 539–547, 2009.[25] T.-H. Wang, C.-W. Chiu, W.-C. Wu, J.-W. Wang, C.-Y. Lin, C.-T. Chiu,and J.-J. Liou, “Pseudo-multiple-exposure-based tone fusion with localregion adjustment,”

IEEE Transactions on Multimedia , vol. 17, no. 4,pp. 470–484, 2015.[26] Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping.”

ACM Trans. Graph. , vol. 36, no. 6, pp. 177–1, 2017.[27] G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger,“Hdr image reconstruction from a single exposure using deep cnns,”

ACM Transactions on Graphics (TOG) , vol. 36, no. 6, p. 178, 2017.[28] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, “Ex-pandnet: A deep convolutional neural network for high dynamic rangeexpansion from low dynamic range content,” in

Computer GraphicsForum , vol. 37, no. 2. Wiley Online Library, 2018, pp. 37–49.[29] S. Lee, G. H. An, and S.-J. Kang, “Deep recursive hdri: Inverse tonemapping using generative adversarial networks,” in

Proceedings of theEuropean Conference on Computer Vision (ECCV) , 2018, pp. 596–611.[30] J. Tumblin and H. Rushmeier, “Tone reproduction for realistic images,”

IEEE Computer graphics and Applications , vol. 13, no. 6, pp. 42–48,1993.[31] G. Ward, “A contrast-based scalefactor for luminance display,”

Graphicsgems IV , pp. 415–421, 1994.[32] I. R. Khan, S. Rahardja, M. M. Khan, M. M. Movania, and F. Abed, “Atone-mapping technique based on histogram using a sensitivity model ofthe human visual system,”

IEEE Transactions on Industrial Electronics ,vol. 65, no. 4, pp. 3469–3479, 2018.[33] J. H. Van Hateren, “Encoding of high dynamic range video with a modelof human cones,”

ACM Transactions on Graphics (TOG) , vol. 25, no. 4,pp. 1380–1399, 2006.[34] H. Spitzer, Y. Karasik, and S. Einav, “Biological gain control for highdynamic range compression,” in

Color and Imaging Conference , vol.2003, no. 1. Society for Imaging Science and Technology, 2003, pp.42–50.[35] R. Mantiuk, S. Daly, and L. Kerofsky, “Display adaptive tone mapping,”in

ACM Transactions on Graphics (TOG) , vol. 27, no. 3. ACM, 2008,p. 68.[36] K. Ma, H. Yeganeh, K. Zeng, and Z. Wang, “High dynamic rangeimage tone mapping by optimizing tone mapped image quality index,”in

Multimedia and Expo (ICME), 2014 IEEE International Conferenceon . IEEE, 2014, pp. 1–6.[37] Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, “Edge-preservingdecompositions for multi-scale tone and detail manipulation,” in

ACMTransactions on Graphics (TOG) , vol. 27, no. 3. ACM, 2008, p. 67.[38] G. Eilertsen, R. K. Mantiuk, and J. Unger, “A comparative review oftone-mapping algorithms for high dynamic range video,” in

Computer Graphics Forum , vol. 36, no. 2. Wiley Online Library, 2017, pp. 565–592.[39] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deepbilateral learning for real-time image enhancement,”

ACM Transactionson Graphics (TOG) , vol. 36, no. 4, p. 118, 2017.[40] A. Loza, L. Mihaylova, N. Canagarajah, and D. Bull, “Structuralsimilarity-based object tracking in video sequences,” in . IEEE, 2006, pp. 1–6.[41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”

IEEETrans Image Process , vol. 13, no. 4, pp. 600–612, 2004.[42] X. Yang, K. Xu, Y. Song, Q. Zhang, X. Wei, and R. W. Lau, “Imagecorrection via deep reciprocating hdr transformation,” in

Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition ,2018, pp. 1798–1807.[43] J. Cai, S. Gu, and L. Zhang, “Learning a deep single image contrastenhancer from multi-exposure images,”

IEEE Transactions on ImageProcessing , vol. 27, no. 4, pp. 2049–2062, 2018.[44] M.-A. Gardner, K. Sunkavalli, E. Yumer, X. Shen, E. Gambaretto,C. Gagn´e, and J.-F. Lalonde, “Learning to predict indoor illuminationfrom a single image,” arXiv preprint arXiv:1704.00090 , 2017.[45] F. Banterle, A. Artusi, K. Debattista, and A. Chalmers,

Advanced HighDynamic Range Imaging (2nd Edition) . Natick, MA, USA: AK Peters(CRC Press), July 2017.[46] S. Ferradans, M. Bertalmio, E. Provenzi, and V. Caselles, “An analysisof visual adaptation and contrast perception for tone mapping,”

IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 33,no. 10, pp. 2002–2012, 2011.[47] Z. Mai, H. Mansour, R. Mantiuk, P. Nasiopoulos, R. Ward, andW. Heidrich, “Optimizing a tone curve for backward-compatible highdynamic range image and video compression,”

IEEE transactions onimage processing

IEEE Transactions on ImageProcessing , vol. 27, no. 4, pp. 2049–2062, 2018.[50] F. Drago, K. Myszkowski, T. Annen, and N. Chiba, “Adaptive logarith-mic mapping for displaying high contrast scenes,” in

Computer GraphicsForum , vol. 22, no. 3. Wiley Online Library, 2003, pp. 419–426.[51] E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, andK. Myszkowski,

High dynamic range imaging: acquisition, display, andimage-based lighting . Morgan Kaufmann, 2010.[52] H. Z. Nafchi, A. Shahkolaei, R. F. Moghaddam, and M. Cheriet,“Fsitm: A feature similarity index for tone-mapped images,”

IEEE SignalProcessing Letters , vol. 22, no. 8, pp. 1026–1029, 2014.[53] R. Mantiuk, K. J. Kim, A. G. Rempel, and W. Heidrich, “Hdr-vdp-2:A calibrated visual metric for visibility and quality predictions in allluminance conditions,”

ACM Transactions on graphics (TOG) , vol. 30,no. 4, pp. 1–14, 2011.[54] M. D. Fairchild, “The hdr photographic survey,” in

Color and ImagingConference , vol. 2007, no. 1. Society for Imaging Science andTechnology, 2007, pp. 233–238.[55] Y. Hojatollah and W. Zhou, “Objective quality assessment of tone-mapped images,”

IEEE Transactions on Image Processing A Publicationof the IEEE Signal Processing Society , vol. 22, no. 2, pp. 657–667, 2013.[56] G. Ke, S. Wang, G. Zhai, S. Ma, X. Yang, W. Lin, W. Zhang,and G. Wen, “Blind quality assessment of tone-mapped images viaanalysis of information, naturalness, and structure,”

IEEE Transactionson Multimedia , vol. 18, no. 3, pp. 432–443, 2016. A PPENDIX

A. Comparative Selection

This section contains 8 questions. Each question has a groupof 7 tone mapped LDR images in random order. Fig. 9 showsan example of a question. Participants can click on each imageto view the full size. During the survey, participants need tochoose an image with the best visual preference.

B. Image Quality Rating

This section contains 6 questions, 7 tone mapped LDRimages in each question. Fig. 10 shows an example of aquestion. Participants are asked to rate each LDR image from to based on image brightness, image contrast, and the extentto which overexposure and underexposure details are revealed. represents ”dislike” or ”fuzzy details” and stands for”most favorite” or ”clear and rich details”. C. Assessment Process and Results

We send out our survey via WeChat (the Chinese social me-dia and multipurpose application) and email. Each participantis only allowed to complete the survey once. Participants canalso choose to abandon the test at any time. Until the submis-sion of the paper, there were a total of 71 people participatedin the survey. Table VII and Table VIII summarizes result ofComparative Selection and Image Quality Rating respectively.We perform a simple averaging function to obtain Ave selection in the Comparative Selection section by theequation

Ave selection = (cid:80) N s i =1 s i N s (8)where s denotes the number of LDR images being selectedfrom each algorithm in each question. N s represents the totalnumber of questions in the Comparative Selection section.To acquire the Ave rating in the Image Quality Ratingsection, we ﬁrst computed the weighted average score w ofeach TMO by the equation w = (cid:80) i =1 r i n i (9)where r i denote each rating score (from 1 to 10). n i is thenumber of r i score that participants rated to the LDR image.Then we used w to calculate the Ave rating

Ave rating = (cid:80) N r i =1 w i N r (10)where N r represents the total number of questions in theImage Quality Rating section. https://surveyhero.com/c/7040d7c6 D. Color Recovery

Our approach operates on luminance WDR images. Weemployed an additional color recovery step to assign a colorto the pixels of compressed dynamic range images using themethod described in [4] ˆ x c = ( ˆ x l h ) s l (11)where ˆ x is the ﬁnal WDR-LDR output after the ﬁne tunenetwork. h and l denote the WDR image of the output andthe one after dynamic range compression, respectively. We setthe color saturation controller s=0.6 as [4] found to producesatisfactory results. E. Additional Qualitative Comparisons

Figs. 11 - 16 show additional results of qualitative compar-ison with the state-of-the-art algorithms [7], [8], [35], [46]–[48]. The images were randomly chosen from the test set ofLaval dataset [44]. The dataset contains uniformly panoramaindoor WDR images in various scenes with the aspect ratioof around 1.0 to 2.0. To better demonstrate our model’sperformance in preserving local details, we crop the full sizeimages to half size and keep the overexposed regions. Ourmodel is able to recover more details in these regions whencompared with other state-of-the-art methods. Figs. 17 - 20show full size output of different methods. It also yields visu-ally comparable or better WDR-LDR images when comparedwith other methods. Figs. 21 - 24 show visual comparisonwith different choices of the number of the frequency band n in the Fairchild dataset [54]. Although our neural networkwas trained with an indoor dataset which the dynamic rangeinevitably much smaller than the outdoors, our method still beable to yield pleasing WDR-LDR images.From these exhaustive predicted outputs, we show the pro-posed method generates output not only effectively compressglobal dynamic range like other compared methods, but alsopreserves and enhances the details of saturated region better. Fig. 9: A screenshot of a survey question in the Comparative Selection section.Fig. 10: A screenshot of a survey question in the Image Quality Rating section.TABLE VII: The summary of the result in the Comparative Selection section. The number from 1 to 8 in the ﬁrst row indicateeach survey question.

Sum denotes the total number of each TMO being selected.

Ave selection denotes the average numberof each TMO being selected.

Methods 1 2 3 4 5 6 7 8

Sum Ave selection

Ferradans [46] 8 4 7 10 8 14 9 9 69 8.625Gu [8] 16 25 9 13 18 8 11 11 111 13.875Mai [47] 15 7 8 8 5 7 9 6 65 8.125Mantiuk [35] 3 6 10 11 7 9 4 4 54 6.75Paris [7] 8 4 17 9 5 3 2 4 52 6.5Photomatix [48] 2 2 10 3 4 4 8 6 39 4.875Ours 19 23 7 13 11 11 13 15

112 14 TABLE VIII: The summary of the results in the Image Quality Rating section. The number from 1 to 6 in the ﬁrst row indicateeach survey question.

Ave rating represents the average rating score of each TMO.

Methods 1 2 3 4 5 6

Ave rating

Ferradans [46] 6.45 5.59 5.47 6.50 5.87 7.23 6.18Gu [8] 6.00 6.39 6.06 7.00 6.10 6.87 6.40Mai [47] 6.64 6.29 6.27 6.73 6.19 7.31 6.57Mantiuk [35] 5.88 6.11 5.73 6.72 5.52 6.52 6.08Paris [7] 5.47 5.93 5.76 5.29 5.16 5.90 5.59Photomatix [48] 6.00 5.52 4.55 6.21 6.10 5.48 5.64Our 6.56 6.25 6.40 6.43 6.84 7.00 (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed

Fig. 11: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturatedregion. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed TMO Fig. 12: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturatedregion. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed TMO

Fig. 13: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturatedregion. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed TMO Fig. 14: Qualitative comparison on Laval data test set. The proposed method is able to recover local details in the saturatedregion. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed TMO

Fig. 15: Qualitative comparison on Laval data test set. The proposed method is able to enhance local details in the saturatedregion. (a) Reference (b) Mantiuk [35] (c) Paris [7] (d) Ferradans [46](e) Mai [47] (f) Gu [8] (g) Photomatrix [48] (h) Proposed TMO Fig. 16: Qualitative comparison on Laval data test set. The proposed method is able to enhance local details in the saturatedregion. Fig. 17: top: Mai TMO [47], bottom: Ferradans TMO [46] Fig. 18: top: Photomatix TMO [48], bottom: Gu TMO [8] Fig. 19: top: Mantiuk TMO [35], bottom: Paris TMO [7] Fig. 20: top: Reference, bottom: Proposed TMO Fig. 21: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the imageswith the frequency band n = 2 , , , , and , respectively.Fig. 22: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the imageswith the frequency band n = 2 , , , , and , respectively. Fig. 23: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the imageswith the frequency band n = 2 , , , , and , respectively.Fig. 24: Visual comparison of the resulting images in different frequency bands. (a), (b), (c), (d), (e) and (f) are the imageswith the frequency band n = 2 , , , , and7