[PDF] HDR Denoising and Deblurring by Learning Spatio-temporal Distortion Models

Abstract

We seek to reconstruct sharp and noise-free high-dynamic range (HDR) video from a dual-exposure sensor that records different low-dynamic range (LDR) information in different pixel columns: Odd columns provide low-exposure, sharp, but noisy information; even columns complement this with less noisy, high-exposure, but motion-blurred data. Previous LDR work learns to deblur and denoise (DISTORTED->CLEAN) supervised by pairs of CLEAN and DISTORTED images. Regrettably, capturing DISTORTED sensor readings is time-consuming; as well, there is a lack of CLEAN HDR videos. We suggest a method to overcome those two limitations. First, we learn a different function instead: CLEAN->DISTORTED, which generates samples containing correlated pixel noise, and row and column noise, as well as motion blur from a low number of CLEAN sensor readings. Second, as there is not enough CLEAN HDR video available, we devise a method to learn from LDR video in-stead. Our approach compares favorably to several strong baselines, and can boost existing methods when they are re-trained on our data. Combined with spatial and temporal super-resolution, it enables applications such as re-lighting with low noise or blur.

Full PDF

HHDR Denoising and Deblurring by Learning Spatio-temporal Distortion Models

U˘gur C¸ o˘galan Mojtaba Bemana Karol Myszkowski Hans-Peter Seidel Tobias Ritschel MPI Informatik University College London

Abstract

We seek to reconstruct sharp and noise-free high-dynamicrange (HDR) video from a dual-exposure sensor that recordsdifferent low-dynamic range (LDR) information in differ-ent pixel columns: Odd columns provide low-exposure,sharp, but noisy information; even columns complementthis with less noisy, high-exposure, but motion-blurreddata. Previous LDR work learns to deblur and denoise( D ISTORTED → C LEAN ) supervised by pairs of C LEAN and D ISTORTED images. Regrettably, capturing D ISTORTED sensor readings is time-consuming; as well, there is a lackof C LEAN

HDR videos. We suggest a method to overcomethose two limitations. First, we learn a different functioninstead: C LEAN → D ISTORTED , which generates samplescontaining correlated pixel noise, and row and column noise,as well as motion blur from a low number of C LEAN sensorreadings. Second, as there is not enough C LEAN

HDR videoavailable, we devise a method to learn from LDR video in-stead. Our approach compares favorably to several strongbaselines, and can boost existing methods when they arere-trained on our data. Combined with spatial and temporalsuper-resolution, it enables applications such as re-lightingwith low noise or blur.

1. Introduction

Common cameras only capture a limited range of lumi-nance values (LDR), while many display and editing taskswould greatly beneﬁt from capturing a higher range of lu-minance values (HDR) [81]. Modern sensors, such as someCMOSIS CMV and Sony IMX sensors, allow one to conﬁg-ure different levels of exposure for different spatial patterns[17, 35]. This allows HDR by spatial interleaving of differ-ent exposures across the sensor. The challenge is to combinedifferent exposures into a coherent natural image.Let us consider, without loss of generality, a case whereevery even row column is captured with a low exposure andevery odd row column with a high exposure. This leads tothree speciﬁc distortions: First, pixel noise inside the imagedoes not follow a single model anymore, but is now strongly

Lo Hi

Our output:

Clean HDR

Input:

Noisy Lo / Blurry Hi

Lo Hi Lo Hi Lo Hi Lo Hi Lo Hi

LoOurHi

Figure 1. Our method maps sensor data capturing low-exposureLDR data with noise and high-exposure LDR data with blur into aclean HDR image. correlated with the column. Different exposures lead todifferent noise, one of the reasons why different exposuresare being used in the ﬁrst place: the low exposures havehigh noise, but are not clamped, while the high exposureshave less noise but suffer from clamping. Second, suchcameras suffer from increased levels of row/column noise ,so orthogonal to the exposure layout, entire rows/columnsof pixels change coherently, and differently for differentexposures. Third, and most different from other sensors,the different exposure level also leads to different formsof motion blur (MB). Not only does MB lead to spatiallyvarying blur, but this blur rapidly alternates between oddand even columns. Low exposures have low MB, whilehigh exposures suffer from strong MB. In summary, thesedistortions do not follow any common noise or motion blurmodel, and hence no method making such assumptions isapplicable to HDR from dual exposure.Removing image distortions (deblurring and denoising)is now typically solved [109, 63, 72, 96] by learning a deepneural network (NN) such as a convolutional neural network(CNN) to implement D

ISTORTED → C LEAN . In our case,this is difﬁcult, as capturing D

ISTORTED sensor readings istime-consuming, and there is also a lack of C

LEAN

HDRvideos. We suggest a method to overcome both limitations.Addressing the ﬁrst, we learn a different function instead:C

LEAN → D ISTORTED , which generates samples containingcorrelated pixel noise, row and column noise, as well as1 a r X i v : . [ ee ss . I V ] D ec otion blur from C LEAN sensor readings. Previous work hasmade simplifying assumptions, such as Gaussian or Poissonnoise, none of which apply to our problem. We suggest anon-parametric noise model that is expressive, yet can betrained on a low number of C

LEAN -D ISTORTED pairs.Second, as there are not enough C

LEAN samples whichrequire HDR video, we devise a method to supervise fromLDR video instead. Unfortunately, this LDR video doesnot have the same type of MB as found in the HDR sensorreadings. Hence, we use high-speed LDR video to simulatecolumn-alternating MB.Our evaluation shows that this synthetic training datadrives our network, resulting in state-of-the-art HDR images,but can also boost existing methods, including vanilla non-learned denoisers like BM3D, when re-tuned. Applicationsspan different exposure ratios, where we show re-lighting ina VR/AR context as a typical HDR application.

2. Previous work

In this section, we discuss previous approaches to sin-gle (Sec. 2.1), multiple (Sec. 2.2), and in particular HDR(Sec. 2.3) image denoising and deblurring.

Noise modeling

Classic solutions involve ﬁtting Gaussianand Poisson [33, 59] or more involved [77] distributions,sometimes under extreme conditions [10], to many pairs ofC

LEAN and D

ISTORTED images. While parametric noisemodels routinely are used as mathematically tractable priors,we use more expressive non-parametric models, as all weneed is to generate distorted training data.

Denoising

Denoising has traditionally been performed di-rectly on noisy images using state-of-the-art algorithms suchas BM3D [18], non-local means [6], and Nuclear Norms [28].Most deep denoisers [10, 109, 108, 7, 63, 11, 29, 54, 37] aresimply trained on pairs of noisy and clean images, whilesome work is trained without pairs [98, 55, 48, 52, 49, 5, 80,69, 104], using GANs [12] or self-supervision [103]. Theusefulness of neural networks in denoising for real sensorshas been disputed [77, 10].

Blur modeling

Video obtained with a high-speed camera[93, 72, 71] or beam splitters [112] enables motion blursynthesis for the purpose of generating training data usinggyroscope-acquired [70] or random [67] motion.

Deblurring

Non-blind deconvolution methods [115, 87, 95,86, 106, 14, 100] restore sharp images given the blur kernel.Blind deconvolution methods attempt to derive the kernelbased on various priors on either the sharp latent image orthe blur kernel [22, 57, 105, 66, 24, 94, 8]. Explicit kernelderivation can be avoided in end-to-end training, where thesharp image is derived directly [72, 96], by self-supervision [60] or adversarial training [50, 51]. Video deblurring addi-tionally capitalizes on inter-frame relationships, while assur-ing temporal coherence of the result [45, 46, 113, 112, 93].Deblurring can be combined either with spatial [110] or tem-poral [79, 40, 39] super-resolution, as done in our approach.The presence of noise, clamping and multiple exposure as inour condition adds a further challenge. Methods such as Panet al. [76] model general distortions using CycleGAN [114],but have not been demonstrated to perform denoising.

A number of solutions have been proposed to capture mul-tiple images of the same content to provide more informationfor ill-posed deblurring and denoising.

Fixed-exposure burst photography

Burst photographycombines a handful of low-exposure frames into a high-quality LDR result using efﬁcient hand-crafted solutionsdeployed in cellphones [61, 31, 58, 58] or based on learningof recurrent architectures [101], or unordered sets [3], orper-pixel ﬁlter kernels [67].

Low/high exposure image pairs

Short-exposure imagesare sharp but noisy, while long-exposure images are blurrybut free of noise. Such exposure pairs have been used fornon-uniform kernel deblurring [107, 100]. Along a similarline, Mustaniemi et al. [70] and Chang et al. [9], in concur-rent work, jointly learn how to denoise and deblur exposurepairs supervised by synthetic training data. Different fromour goal, they produce LDR output, while we aim for HDR.

HDR means covering a large range of luminance viamultiple exposure, special sensors or software expansion.

Multi-shot

A typical sensor can capture a wide range ofluminances, just not within one shot. Alternatively, an expo-sure sequence , i. e., time-sequential capture of one scene atdifferent exposure settings, can be merged into one image[62, 68, 19, 83, 25]. Further, exposure sequences can befused into a high-quality LDR image [65, 78, 70]. Whendealing with video [44, 43, 26] or when using neural net-works [41, 42, 102], alignment becomes a challenge.

Single-shot

Capturing exposure sequences takes time andtheir alignment is challenging, in particular for video. Thiscan be alleviated by single-shot solutions relying on customoptics and sensors. Logarithmic response does not requireany exposure control [90], but remains prone to noise indark regions. Spatially-varying exposure (SVE) techniquesplace a ﬁxed [75, 89, 88, 91, 2] or adaptive [73, 74] mask ofvariable optical density in front of the sensor, but face prob-lems with resolution and aliasing. Beam splitting preservesresolution with different exposures [97, 1, 47] but requiresinvolved optics. Dual-ISO sensors, e. g., Gpixel GMAX andsome of the Canon EON sensors, enable varying signal gain2or odd and even scanlines. Their key advantage is that vari-able blur between scanlines is avoided, as the exposition isﬁxed for the whole sensor. On the other hand, instead ofcollecting more photons in the long exposure and reducingnoise this way, only a noisy short exposure is taken, and thelong exposure is emulated by increasing ISO, which leads tofurther noise ampliﬁcation. Therefore, denoising and deinter-lacing are the key challenges for processing dual-ISO frames[30, 84, 23], including data-driven solutions such as jointlylearned artifact dictionaries [15] and CNNs [116].Dual-exposure CMOS sensors enable varying exposuresfor odd and even scanlines (some Aptina AR and Sony IMXsensors [35]) or columns (CMOSIS CMV12000 [17]). Guet al. [27] perform ﬂow-compensated interpolation for subim-age deinterlacing so that differently exposed, full-resolutionimages are obtained. Cho et al. [13] directly calibrate scan-lines using bilateral ﬁlters followed by motion blur removal[56] and sharpening. Along similar lines, Heide et al. [34]propose an end-to-end optimization, which jointly accountsfor demosaicking, deinterlacing, denoising, and deconvo-lution. An and Lee [4] restore under- and over-exposedpixels using a CNN, but no results for real sensor data aredemonstrated. Our work performs joint denoising, deinter-lacing and deblurring, trained on a small set of captured data,resulting in high-quality HDR.

Dynamic range expansion

LDR can be expanded to HDRin software. Although immense progress has been madebased on CNNs [64, 21, 20], results do not yet match thequality of multi-exposure techniques or dedicated sensors.

Gaussian re-synthesis Sensor readingOurs re-synthesis I m a g e N o i s e Figure 2. A Gaussian noise model (left) , our low-exposure re-synthesis (middle) from a noise-free high-exposure reference (notshown), and a real low-exposure sensor reading reference (right).

Note the long-range correlation across ours and the reference.

3. HDR exposure distortion and back

Our approach has two steps: learning a model to synthe-size distortions to train on (Sec. 3.1; an example result inFig. 2) and learning to remove distortions (Sec. 3.2).

There are three distortion steps we describe in the orderof the underlying physics (Fig. 3): motion blur (Sec. 3.1.1),pixel noise (Sec. 3.1.2), and row/column noise (Sec. 3.1.3).For all steps, we will look at the analysis from noisy sensorreadings to devise a statistical model for inference fromD

ISTORTED , and a synthesis step to apply it to C

LEAN . With different exposures in different columns, their MB isalso different. For example, at exposure ratio r = 4 , MBalso is four times longer and the image is a mix of sharp andblurry columns. As getting reference data without MB, inparticular HDR, is difﬁcult, we turn to existing LDR high-speed video footage to simulate multi-exposure MB. Data

We use 123 videos from the Adobe High-speed VideoDataset [93] which have no, or negligible, inherent MB in atotal of 8000 frames. Note that these are not captured withour sensor, and are LDR. They are neither input to nor outputfrom of our approach and only provide supervision.

Synthesis

Synthesis starts from a random frame of 8-bitLDR high-speed video I LDR . It is converted to ﬂoating point,and an inverse gamma is applied at γ = 2 . . We call thisthe low frame image, denoted I L = I γ LDR . To simulate highframe exposure, we scale four low frames by the exposureratio, clamp them, and average as in I H = clamp ( r × E t ∈{ , , , } [ I L ( t )]) . Finally, the low-frame pixels are inserted into the evencolumns and the high frames into the odd ones, resulting inthe motion-blurred image I MB . Pixel noise, which occurs in the sensor, is applied after mo-tion blur, which happens in the optics. Instead of employinga parametric noise model that has the strengths as priors andfor analysis, we use non-parametric histograms to capturea noise model well-suited for generation. Prior to the noisemodel derivation, we remove the ﬁxed pattern noise fromthe sensor readings [36].

Data

We assume we have a limited amount of GT sensorreadings available. In practice, we use no more than 30pairs of images (not video) captured with the target sensorof everyday scenes, as well as a ground truth acquired byaveraging the result of 100 captures of the scene at a verylow exposure (so as to make clipping effects negligible) andusing a very long exposure.

Analysis

The noise is different for different exposures andalso for different color channels. We build a model p c,e ( x | y ) ,3 DR 240Hz Video Frame t Frame t+1 Lo HiLo HiLo HiLo HiLo Hi Lo Hi Lo HiFrame t+nNoisy sensor read Integration Virtual exposure = MBPixel noise = MB+PNRow noise = MB+PN+RNNoise-free sensor read Pixel noise modelRow / column noise modelLo Hi Lo HiLo HiNoisy row/column-average Noise-free row/column-average

Figure 3. Our proposed HDR distortion generation pipeline: We start from LDR 240 Hz video in the top left, from which frames t to t + n are extracted, integrated, and virtually exposed to produce an image with MB (ﬁrst row) . Next, we take pairs of noisy and time-averagednoise-free sensor readings, and produce a non-parametric noise mode (histogram) for low and high exposure. This noise model is added tothe virtual exposure image MB (second row) . Finally, a model of row and column noise is extracted by averaging vertically or horizontally;this can be added to the pixel noise image, producing the ﬁnal image with all distortions present (third row) . the probability that when the GT value is y , the sensor willread x for channel c and exposure e . A separate model ismaintained for every channel in every exposure, leadingto six models for three color channels and two exposures.While we notice the noise models to be similar for differentchannels at the same exposure, it is, unsurprisingly, differentfor different exposure. Histograms H c,e [ x ][ y ] are used torepresent the probability distribution over x for each y inchannel c at exposure e . To construct all histograms, everypair of sensor readings and its ground truth, as well as everypixel and every channel, are iterated, and bin x for histogram y is incremented when the GT pixel is y and the sensorreading is x for channel c and exposure e . The numberof histogram bins depends on the bit depth, typically 12bits, resulting in 4096 bins. After analysis, all histogramsare converted into inverse cumulative histograms C c,e [ x ][ y ] ,allowing us to sample from them in constant time. Synthesis

Noise synthesis is applied to I MB , the imagewith simulated MB. Every pixel and every channel of theMB image I MB is iterated to obtain a GT value y . A randomnumber ξ c,e is used to look up the respective cumulativehistogram C c,e to produce a simulated sensor value x . Com-bining all pixels, channels and exposures results in a virtualsynthetic image I PN involving MB and pixel noise. At short exposures more structured forms of noise canbecome important, one of them being row/column noise. This is not to be mistaken with ﬁxed-pattern noise that fre-quently is spatially-correlated, but much easier to correct.In row/column noise, pixels do not change independently;rather, all pixels in a row/column change in correlation, i. e.,the entire row/column is darkened or brightened. This isbecause in the CMOSIS CMV12000 (global shutter) sensorpixel read-out is performed sequentially row-by-row, result-ing in differences between the rows. The analog pixel valuesare then passed to a column gain ampliﬁer and a columnanalog-digital converter (ADC), which are used to speed upprocessing, but introduce differences between the columns[17]. As those effects are visually distracting, we synthesizeand ultimately remove them.

Analysis

We again iterate all pairs of GT and sensor im-ages, but instead of working on pixels, we now work onentire rows/columns. In particular we look at the six sepa-rate means across every row/column for every channel andexposure. We denote this mean as ¯ x in the sensor image andas ¯ y in the GT image. We now proceed as with pixel noiseand build a model in the form of a histogram, ultimatelyresulting in the inverse cumulative row/column noise model ¯ C c,e [¯ x ][¯ y ] . Synthesis

Synthesis of row/column noise starts from theimage with synthetic MB and pixel noise I P N . We oncemore iterate every row, channel and exposure, compute therow/column mean ¯ y c,e and again use a random number ¯ ξ c,e to draw from ¯ C c,e [ ¯ ξ ][¯ y ] . To make the row/column meanmatch the desired mean, we simply add the difference ofthe means to the row/column, resulting in the ﬁnal synthetic4oisy image I All . We use a U-Net [85] with skip connections and residualconnections [32] and sub-pixel convolutions [92] to mapdistorted × × patches to × × clean patchesunder an SSIM loss [111]. The mapping is performed in thelinear space and output is converted to RGB and gamma-corrected after the loss.

4. Results

We present quantitative and qualitative evaluation on de-blurring/denoising tasks (Sec. 4.1), as well as on a tem-poral (Sec. 4.2) and spatial (Sec. 4.3) super-resolutiontask. Interactive comparison and videos can be found at https://deephdr.mpi-inf.mpg.de .All test images have been captured using an Axiom-betacamera with a CMOSIS CMV12000 sensor [17] and a CanonEFS 18-135 mm lens at resolution 4096 × We now evaluate the combination of our method andour synthetic training data as well as other ways to obtaintraining data and other methods for denoising and deblurring.

Methods

We consider eight methods (color-coded;“Method” in Tbl. 1): Direct is a non-learned direct, physics-based fusion of the low and high frame, with bicubic up-sampling [19]. Next, BM3D [18] is a gold-standard, non-deep denoiser. When BM3D is “trained” this means per-forming a grid search on the training data in order toﬁnd the standard deviation parameter with the the optimalDSSIM. FFDNet [109] is a state-of-the-art deep denoiser.DBGAN [51] and SRNDB [96] are recent deblurring ap-proaches. LSD [70] is a deep multi-exposure method thatproduces denoised and deblurred LDR images. The ﬁnalmethod is Heide [34] which is a general image reconstruc-tion method, capable of working with multiplexed exposures.

Training data

For each method, we study how it performswhen trained with different data (“Train. data” in Tbl. 1).Each type of training data has a different symbol. We denoteit as “Theirs” ( (cid:116) ) if the authors provide a pre-trained ver-sion. “Sensor” ( (cid:115) ) means training on the image for whichwe have paired training data available directly, i. e., withoutour proposed re-synthesis. Please note that this training isnot applicable to tasks that involve removing MB, as thesupervision inevitably contains MB. Next, we study het-eroscedastic Gaussian noise, “HetGau” ( (cid:108) ) which refers totaking our training data, ﬁtting a linear model of Gaussian

Table 1. Performance of different methods and different trainingdata (rows) for different tasks (columns) . Different icon shapesdenote different training; colors map to different methods.

TaskIn Lo (cid:88) (cid:53) (cid:88) (cid:88)

In Hi+MB (cid:53) (cid:88) (cid:88) (cid:88)

Out MB (cid:53) (cid:53) (cid:88) (cid:53)

Out HDR (cid:53) (cid:53) (cid:88) (cid:88)

Train. data Method Error (DSSIM × − ) (cid:116) Theirs Direct [19] 7.87 7.08 3.70 5.52 (cid:116)

Theirs BM3D [18] 2.98 4.10 2.00 2.63 (cid:115)

Sensor 2.84 —– 1.90 —– (cid:108)

HetGau 2.75 3.86 1.76 2.32 (cid:89)

OurAll (cid:116)

Theirs FFDNet [109] 3.79 4.31 2.18 2.83 (cid:115)

Sensor 2.78 —– 2.03 —– (cid:89)

OurAll 2.78 3.92 2.03 2.54 (cid:116)

Theirs DBGAN [51] 5.31 4.88 2.95 3.32 (cid:116)

Theirs SRN-DB [96] 3.28 4.36 2.27 2.60 (cid:116)

Theirs LSD [70] —– 2.94 3.24 2.46 (cid:116) Theirs Heide et al. [34] 5.27 (cid:115)

Sensor Ours 6.51 —– 4.62 —– (cid:108)

HetGau 3.14 3.17 2.35 2.15 (cid:70)

OurRN 5.33 5.24 4.41 4.32 (cid:72)

OurPN 4.24 4.60 3.01 3.06 (cid:88)

OurMB 4.23 3.63 2.17 3.15 (cid:89)

OurAll 2.75 parameters of the error distribution and then re-synthesizingtraining. Finally, we study four ablations of our trainingdata generation: only motion blur (“OurMB”, (cid:88) ), only pixelnoise (“OurPN”, (cid:72) ), only row noise (“OurRN”, (cid:70) ), andﬁnally (“OurAll”, (cid:89) ) in Tbl. 1.

Metrics

We measure DSSIM [99], where less is better.

Tasks

We study four tasks (four last columns in Tbl. 1):First, we remove noise in the low exposure only (L O O ).Second, we remove noise and MB in the high exposure only(H I I -MB). Third, is a task where input is both exposuresand output is an HDR image without noise, L O H I O H I Discussion

Results are shown in Tbl. 1. Our method trainedon our synthetic training data ( (cid:89) ) performs best on all tasks.Our ablations ( (cid:70) , (cid:72) and (cid:88) ) all perform worse than the5 ow exp High exp Direct BM3D FFDNet DBGAN SRNDB OursOurs ✹ ▲ ✹ ▲ ▲ ▲ ✹ Figure 4. Comparison of different methods (columns) on two scenes (rows) . Please see the text for discussion. full method, indicating all additions are relevant. Lookinginto how other methods trained on data synthesized usingour distortion model perform ( (cid:89) and (cid:89) ), we see that ﬁrst,they all improve in comparison to being trained on theiroriginal data ( (cid:116) , (cid:116) , (cid:116) , (cid:116) and (cid:116) , respectively), but, second,none can compete with our method trained on that data ( (cid:89) ).Only (cid:89) , as a competing method, when tuned on our data,can compete on its home ground, L O O . We also triedtraining our network with other data, such as using sensordata directly ( (cid:115) ), hetroscedatic Gaussian noise ( (cid:108) ), but noneof these was able to capture the combination of motion blur,pixel noise and row/column noise, resulting in larger errors.As a sanity check, we also tuned BM3D on sensor data( (cid:115) ) and hetroscedatic Gaussian noise ( (cid:108) ), but no choiceof parameters, even with that information, can get BM3Dto perform much better on test data. A further test is tocompare to (cid:116) , which is not learned or doing anything exceptup-sampling and fusion; this should be a lower bound for anymethod or task. Finally, our approach compares favorablyto Heide et al. [34] ( (cid:116) ), a general, powerful and ﬂexibleimaging framework that can work on multi-exposure images.When looking at performance for different tasks, we ﬁnd thatfor simpler tasks, such as L O O , i. e., a direct denoising,unsurprisingly, our best result ( (cid:89) ) performs comparablyto the gold standard ( (cid:116) ), in particular when tuned on ourdata ( (cid:89) ). When the task gets more involved, i. e., removingMB or producing HDR, the methods start to perform moresimilarly, but ours tends to win by a larger margin. Forcompleteness, our analysis includes methods designed fordenoising being applied to a deblurring task or vice versa.As all tasks except L O O involve components of bothdeblurring and denoising, we report those numbers to certifythat no method solving only one of the tasks, does it so wellthat the DSSIM is reduced more than another method trying to solve both tasks. This is probably because both noise andblur are visually important, and no method, including ours,can reduce one of them enough to make the other irrelevant.In summary, using the right training data helps our methodsand others to solve multiple aspects of multiple tasks.The quantitative results from above are complemented bythe qualitative ones in Fig. 4. The ﬁrst row shows our ( (cid:89) )complete image. The second and third row show selectedpatches from the the low and high input, which suffer fromnoise or blur respectively. Directly ( (cid:116) ) fusing both intoHDR, as in the fourth column, reduces noise and blur, butcannot remove them. The BM3D ( (cid:89) ) and FFDNet ( (cid:116) )columns show that individual frames can be denoised, butblur remains. This is most visible in moving parts, such asthe dots in the second row. Using de-blurring, as in DBGAN( (cid:116) ) or SRNDB ( (cid:116) ), can reduce blur, but this often leadsto ringing. Our joint method ( (cid:89) ) performs best on theseimages.Fig. 5 compares our result at an exposure rate of 16:1to the best single-exposure result. We note our approachreproduces details in the bright (outdoor) part as well as inthe dark (indoor) part despite the massive contrast. The bestLDR ﬁt can resolve some of the outdoor elements, but hasno details except quantization noise in the dark part. In temporal super-resolution, we extend the L O H I DR -M B task to output not a single image, but n images instead.To generate training data, we still extract sequences of n high-speed video frames, and we still call the ﬁrst frame thelow frame and the integral of all n frames the high exposure,but the output is not frame but n individual frames. Thearchitecture is identical, except that it produces n images inthe last layer. Note that the input is still only two interleaving6 igure 5. Comparison of our reconstruction at an exposure rate of16:1 and the best single exposure result (inset stripes) . exposures, where one has severe MB and the other severenoise. Fig. 6 shows the outcome of Our reconstruction.We compare this method with a baseline in which weﬁrst run our non-super-resolution method, then we applytemporal upsampling [38] to extract n frames in between.Results are shown in Tbl. 2. Frame 1 Frame 2 Frame 3 Frame 4

Figure 6. Four frames cropped (top) from an HDR video withtemporal super-resolution using

Our approach. The full frame 2 (middle) . An epipolar slice for the marked row (bottom) . Analogously to temporal super-resolution, we can alsolook at spatial super-resolution [53]. Here, training data isspatially down-scaled before being used to simulate Hi andLo frames. At training time, the decoder branch is simplyrepeated several times to produce output patches larger than

Table 2. HDR super-resolution in combination with denoisingand deblurring. “Us-Them” in temporal super-resolution means toﬁrst run

Our non-super-resolution method, followed by temporalsuper-resolution method [38]. “Us-Them” in case of spatial super-resolution means to ﬁrst run

Our non-super-resolution method,followed by a simple bicubic upsampling. “End-to-End” means

Our full method.

Us-Them End-to-EndTemporal 0.032 0.026Spatial 0.074 0.035the input patches. In Fig. 7 we show a comparison of simplebicubic upsampling to

Our non-super-resolution and

Our full methods.

Bicubic Ours Ours+SR

Figure 7. Spatial super-resolution.

A key application of HDR information is to use it forillumination reconstruction [19]. We captured a mirror ball,removed motion blur and noise using our full method ( (cid:89) ),and re-rendered it using Blender’s [16] path tracer with 512samples and automatic tone and gamma mapping. The re-sulting image is seen in Fig. 8. We ﬁnd that the non-linearmapping of Monte Carlo rendering ampliﬁes structures andnoise gets more visible, in particular row noise. Using onlythe high exposure removes noise, but cannot capture the dy-namic range, resulting in washed-out shadows. Our methodsucceeds in removing it, in particular row noise, resulting insharp shadows as well as noise-free reﬂections. Note thatsome noise is present in all images due to ﬁnite Monte Carlosample count (all images computed 20 minutes). The noiseappears less in the high exposure, as reduced contrast resultsin an easier light simulation problem, but the solution isbiased, predicting an apparently more smooth solution to adifferent, a simpler, problem.

5. Conclusion

We presented a CNN solution for HDR image reconstruc-tion tailored for a single-shot dual-exposure sensor. By jointprocessing of low and high exposures and taking advantageof their perfect spatial and temporal registration, our solutionsolves a number of serious problems inherent to such sensorssuch as correlated noise and spatially varying blur, as well as7 oisyre fl ectionSharp shadow Clearre fl ectionBlurry shadow Sharp shadow Low-exposure High-exposure Ours

Clearre fl ection Figure 8. Rendering from a spherical illumination map captured at a low exposure (left) , a high exposure (middle) and using our approach (right) . For each approach the illumination is seen as an inset on the left. For the low exposure, the shadows are sharp, as the light sourcedid not saturate, but the dark regions are clipped and massively noisy. For the high exposure, the dark regions are reproduced, slightly noisy,but the light source is clamped, leading to a loss in dynamic range and a loss of sharp shadows. Our method reproduces both. Note thatvisible overall brightness differences are expected, as clamping is present in some images, which does not conserve energy. interlacing and spatial resolution reduction. We demonstratethat, by capturing a limited amount of data speciﬁc for suchsensors and using simple histograms to represent the noisestatistics, we were able to generate synthetic training datathat led to a better denoising and deblurring quality thanachieved by existing state-of-the-art techniques. Moreover,we show that by using our limited sensor-speciﬁc data, theperformance of other techniques can greatly be improved.This is for two reasons: First, previous methods did not haveaccess to massive amounts of training data for dual-exposuresensors, a problem we solve here by proposing the ﬁrst dedi-cated distortion model allowing to synthesize training data.Second, dual-exposure sensors in combination with properCNN-based denoising and deblurring provide us with muchricher data managed to fuse. Finally, we present an appli-cation of captured HDR environment maps for 3D scenere-lighting, where our denoising and deblurring improve thequality of Monte Carlo rendering.

References [1] M. Aggarwal and N. Ahuja. Split aperture imaging for highdynamic range. In

ICCV , volume 2, pages 10–17, 2001. 2[2] C. Aguerrebere, A. Almansa, Y. Gousseau, J. Delon, andP. Mus´e. Single shot high dynamic range imaging usingpiecewise linear estimators. In

ICCP , pages 1–10, 2014. 2[3] Miika Aittala and Fredo Durand. Burst image deblurringusing permutation invariant convolutional neural networks.In

ECCV , 2018. 2[4] V. G. An and C. Lee. Single-shot high dynamic range imag-ing via deep convolutional neural network. In

APSIPA , pages1768–1772, 2017. 3[5] Joshua Batson and Loic Royer. Noise2self: Blind denoisingby self-supervision. In

ICML , pages 524–533, 2019. 2 [6] Antoni Buades, Bartomeu Coll, and J-M Morel. A non-localalgorithm for image denoising. In

CVPR , volume 2, pages60–65, 2005. 2[7] Harold C Burger, Christian J Schuler, and Stefan Harmeling.Image denoising: Can plain neural networks compete withBM3D? In

CVPR , pages 2392–2399, 2012. 2[8] Ayan Chakrabarti. A neural approach to blind motion de-blurring. In

ECCV , pages 221–235, 2016. 2[9] Meng Chang, Huajun Feng, Zhihai Xu, and Qi Li. Low-lightimage restoration with short- and long-exposure raw pairs,2020. 2[10] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see inthe dark. In

CVPR , pages 3291–3300, 2018. 2[11] Jingwen Chen, Jiawei Chen, Hongyang Chao, and MingYang. Image blind denoising with generative adversarialnetwork based noise modeling. In

CVPR , 2018. 2[12] Jingwen Chen, Jiawei Chen, Hongyang Chao, and MingYang. Image blind denoising with generative adversarialnetwork based noise modeling. In

CVPR , pages 3155–3164,2018. 2[13] Hojin Cho, Seon Joo Kim, and Seungyong Lee. Single-shothigh dynamic range imaging using coded electronic shutter.

Comp. Graph. Forum , 33(7):329–338, 2014. 3[14] S. Cho, Jue Wang, and S. Lee. Handling outliers in non-blind image deconvolution. In

CVPR , pages 495–502, 2011.2[15] Inchang Choi, Seung-Hwan Baek, and M. Kim. Recon-structing interlaced high-dynamic-range video using jointlearning.

IEEE TIP , 26:5353–5366, 2017. 3[16] Blender Online Community.

Blender , 2020. 7[17] CMOSIS CVM. https://ams.com/cmv12000, accessed onNov. 12, 2020. 1, 3, 4, 5[18] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Imagedenoising by sparse 3-D transform-domain collaborativeﬁltering.

IEEE TIP , 16(8):2080–2095, 2007. 2, 5[19] Paul E. Debevec and Jitendra Malik. Recovering high dy- amic range radiance maps from photographs. In Proc.SIGGRAPH , page 369–378, 1997. 2, 5, 7[20] Gabriel Eilertsen, Joel Kronander, Gyorgy Denes, Rafał K.Mantiuk, and Jonas Unger. HDR image reconstruction froma single exposure using deep CNNs.

ACM Trans. Graph. ,36(6), 2017. 3[21] Yuki Endo, Yoshihiro Kanamori, and Jun Mitani. Deepreverse tone mapping.

ACM Trans. Graph. , 36(6), 2017. 3[22] Rob Fergus, Barun Singh, Aaron Hertzmann, Sam T. Roweis,and William T. Freeman. Removing camera shake from asingle photograph.

ACM Trans. Graph. , 25(3):787–794,2006. 2[23] Chihiro Go, Yuma Kinoshita, Sayaka Shiota, and HitoshiKiya. An image fusion scheme for single-shot high dynamicrange imaging with spatially varying exposures.

CoRR ,abs/1908.08195, 2019. 3[24] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A.Van Den Hengel, and Q. Shi. From motion blur to motionﬂow: A deep learning solution for removing heterogeneousmotion blur. In

CVPR , pages 3806–3815, 2017. 2[25] M. Granados, B. Ajdin, M. Wand, C. Theobalt, H. Seidel,and H. P. A. Lensch. Optimal HDR reconstruction withlinear digital cameras. In

CVPR , pages 215–222, 2010. 2[26] Yulia Gryaditskaya, Tania Pouli, Erik Reinhard, KarolMyszkowski, and Hans-Peter Seidel. Motion aware ex-posure bracketing for HDR video.

Comp. Graph. Forum ,34(4):119–130, 2015. 2[27] J. Gu, Y. Hitomi, T. Mitsunaga, and S. Nayar. Coded rollingshutter photography: Flexible space-time sampling. In

ICCP ,pages 1–8, 2010. 3[28] S. Gu, L. Zhang, W. Zuo, and X. Feng. Weighted nuclearnorm minimization with application to image denoising. In

CVPR , pages 2862–2869, 2014. 2[29] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and LeiZhang. Toward convolutional blind denoising of real pho-tographs. In

CVPR , 2019. 2[30] Saghi Hajisharif, Joel Kronander, and J. Unger. HDR re-construction for alternating gain (ISO) sensor readout. In

Eurographics , 2014. 3[31] Samuel W. Hasinoff, Dillon Sharlet, Ryan Geiss, AndrewAdams, Jonathan T. Barron, Florian Kainz, Jiawen Chen,and Marc Levoy. Burst photography for high dynamic rangeand low-light imaging on mobile cameras.

ACM Trans.Graph. , 35(6), 2016. 2[32] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Deep residual learning for image recognition. In

CVPR ,pages 770–78, 2016. 5[33] Glenn E Healey and Raghava Kondepudy. Radiometricccd camera calibration and noise estimation.

IEEE PAMI ,16(3):267–276, 1994. 2[34] Felix Heide, Markus Steinberger, Yun-Ta Tsai, MushﬁqurRouf, Dawid Pajak, Dikpal Reddy, Orazio Gallo, Jing Liu,Wolfgang Heidrich, Karen Egiazarian, Jan Kautz, and KariPulli. FlexISP: A ﬂexible camera image processing frame-work.

ACM Trans. Graph.

Scientiﬁc Charge-coupled Devices . 2001. 3[37] Xixi Jia, Sanyang Liu, Xiangchu Feng, and Lei Zhang. FOC-Net: A fractional optimal control network for image denois-ing. In

CVPR , 2019. 2[38] Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-HsuanYang, Erik Learned-Miller, and Jan Kautz. Super SloMo:High quality estimation of multiple intermediate frames forvideo interpolation. In

CVPR , 2018. 7[39] M. Jin, Z. Hu, and P. Favaro. Learning to extract ﬂawlessslow motion from blurry videos. In

CVPR , pages 8104–8113,2019. 2[40] M. Jin, G. Meishvili, and P. Favaro. Learning to extracta video sequence from a single motion-blurred image. In

CVPR , pages 6334–6342, 2018. 2[41] Nima Khademi Kalantari and Ravi Ramamoorthi. Deephigh dynamic range imaging of dynamic scenes.

ACM Trans.Graph. , 36(4), 2017. 2[42] Nima Khademi Kalantari and Ravi Ramamoorthi. DeepHDR video from sequences with alternating exposures. In

Eurographics , 2019. 2[43] Nima Khademi Kalantari, Eli Shechtman, Connelly Barnes,Soheil Darabi, Dan B. Goldman, and Pradeep Sen. Patch-based high dynamic range video.

ACM Trans. Graph. , 32(6),2013. 2[44] Sing Bing Kang, Matthew Uyttendaele, Simon Winder, andRichard Szeliski. High dynamic range video.

ACM Trans.Graph. , 22(3):319–325, 2003. 2[45] T. Kim and K. Lee. Generalized video deblurring for dy-namic scenes. In

CVPR , pages 5426–5434, 2015. 2[46] T. H. Kim, K. M. Lee, B. Sch¨olkopf, and M. Hirsch. Onlinevideo deblurring via dynamic temporal blending network.In

ICCV , pages 4058–4067, 2017. 2[47] J. Kronander, S. Gustavson, G. Bonnet, and J. Unger. Uni-ﬁed HDR reconstruction from raw CFA data. In

ICCP , pages1–9, 2013. 2[48] Alexander Krull, Tim-Oliver Buchholz, and Florian Jug.Noise2void - learning denoising from single noisy images.In

CVPR , 2019. 2[49] Alexander Krull, Tomas Vicar, and Florian Jug. Proba-bilistic noise2void: Unsupervised content-aware denoising. arXiv:1906.00651 , 2019. 2[50] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych,Dmytro Mishkin, and Jiˇr´ı Matas. Deblurgan: Blind mo-tion deblurring using conditional adversarial networks. In

CVPR , 2018. 2[51] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and ZhangyangWang. Deblurgan-v2: Deblurring (orders-of-magnitude)faster and better. In

ICCV , pages 8878–8887, 2019. 2, 5[52] Samuli Laine, Tero Karras, Jaakko Lehtinen, and Timo Aila.High-quality self-supervised deep image denoising. In

NiPS ,volume 32, pages 6970–6980, 2019. 2[53] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero,Andrew Cunningham, Alejandro Acosta, Andrew Aitken,Alykhan Tejani, Johannes Totz, Zehan Wang, and WenzheShi. Photo-realistic single image super-resolution using agenerative adversarial network. In

CVPR , 2017. 7[54] Stamatios Lefkimmiatis. Universal denoising networks: Anovel CNN architecture for image denoising. In

CVPR ,2018. 2[55] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli aine, Tero Karras, Miika Aittala, and Timo Aila.Noise2Noise: Learning image restoration without clean data.In Proc. of Machine Learning Research , volume 80, pages2965–2974, 2018. 2[56] Frank Lenzen and Otmar Scherzer. Partial differential equa-tions for zooming, deinterlacing and dejittering.

Int. J Comp.Vis. , 92:162–176, 2011. 3[57] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Under-standing and evaluating blind deconvolution algorithms. In

CVPR , pages 1964–1971, 2009. 2[58] Orly Liba, Kiran Murthy, Yun-Ta Tsai, Tim Brooks, TianfanXue, Nikhil Karnad, Qiurui He, Jonathan T. Barron, DillonSharlet, Ryan Geiss, Samuel W. Hasinoff, Yael Pritch, andMarc Levoy. Handheld mobile photography in very lowlight.

ACM Trans. Graph. , 38(6), 2019. 2[59] Ce Liu, Richard Szeliski, Sing Bing Kang, C LawrenceZitnick, and William T Freeman. Automatic estimationand removal of noise from a single image.

IEEE PAMI ,30(2):299–314, 2007. 2[60] Peidong Liu, Joel Janai, Marc Pollefeys, Torsten Sattler, andAndreas Geiger. Self-supervised linear motion deblurring.

IEEE Robotics and Automation Letters , 5(2):2475–2482,2020. 2[61] Ziwei Liu, Lu Yuan, Xiaoou Tang, Matt Uyttendaele, andJian Sun. Fast burst images denoising.

ACM Trans. Graph. ,33(6), 2014. 2[62] S. Mann and R. W. Picard. On being ’undigital’ with digitalcameras: Extending dynamic range by combining differentlyexposed pictures. In

ISfT , pages 442–448, 1995. 2[63] Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Imagerestoration using very deep convolutional encoder-decodernetworks with symmetric skip connections. In

NiPS , pages2802–2810, 2016. 1, 2[64] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. De-battista. Expandnet: A deep convolutional neural networkfor high dynamic range expansion from low dynamic rangecontent.

Comp. Graph. Forum , 37(2):37–49, 2018. 3[65] T. Mertens, J. Kautz, and F. Van Reeth. Exposure fusion. In

Paciﬁc Graphics , pages 382–390, 2007. 2[66] Tomer Michaeli and Michal Irani. Blind deblurring usinginternal patch recurrence. In

ECCV , pages 783–798, 2014.2[67] B. Mildenhall, J. T. Barron, J. Chen, D. Sharlet, R. Ng, andR. Carroll. Burst denoising with kernel prediction networks.In

CVPR , pages 2502–2510, 2018. 2[68] T. Mitsunaga and S. K. Nayar. Radiometric self calibration.In

CVPR , pages 374–380 Vol. 1, 1999. 2[69] Nick Moran, Dan Schmidt, Yu Zhong, and Patrick Coady.Noisier2noise: Learning to denoise from unpaired noisydata. In

CVPR , pages 12064–12072, 2020. 2[70] Janne Mustaniemi, Juho Kannala, Jiri Matas, Simo S¨arkk¨a,and Janne Heikkil¨a. LSD -joint denoising and deblurring ofshort and long exposure images with convolutional neuralnetworks. In BMVC , 2020. 2, 5[71] S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte,and K. M. Lee. NTIRE 2019 challenge on video deblurringand super-resolution: Dataset and study. In

CVPRW , pages1996–2005, 2019. 2[72] S. Nah, T. H. Kim, and K. M. Lee. Deep multi-scale con- volutional neural network for dynamic scene deblurring. In

CVPR , pages 257–265, 2017. 1, 2[73] Nayar and Branzoi. Adaptive dynamic range imaging: opti-cal control of pixel exposures over space and time. In

ICCV ,pages 1168–1175 vol.2, 2003. 2[74] S. K. Nayar, V. Branzoi, and T. E. Boult. Programmableimaging using a digital micromirror array. In

CVPR , vol-ume 1, pages I–I, 2004. 2[75] Shree K. Nayar and Tomoo Mitsunaga. High dynamic rangeimaging: Spatially varying pixel exposures. In

CVPR , pages1472–1479, 2000. 2[76] Jinshan Pan, Jiangxin Dong, Yang Liu, Jiawei Zhang,Jimmy Ren, Jinhui Tang, Yu Wing Tai, and Ming-HsuanYang. Physics-based generative adversarial models for im-age restoration and beyond.

IEEE Computer ArchitectureLetters , (01):1–1, 2020. 2[77] T. Pl¨otz and S. Roth. Benchmarking denoising algorithmswith real photographs. In

CVPR , pages 2750–2759, 2017. 2[78] K. R. Prabhakar, V. S. Srikar, and R. V. Babu. Deepfuse:A deep unsupervised approach for exposure fusion withextreme exposure image pairs. In

ICCV , pages 4724–4732,2017. 2[79] K. Purohit, A. Shah, and A. N. Rajagopalan. Bringing aliveblurred moments. In

CVPR , pages 6823–6832, 2019. 2[80] Yuhui Quan, Mingqin Chen, Tongyao Pang, and Hui Ji.Self2self with dropout: Learning self-supervised denoisingfrom single image. In

CVPR , 2020. 2[81] Erik Reinhard, Wolfgang Heidrich, Paul Debevec, SumantaPattanaik, Greg Ward, and Karol Myszkowski.

High dy-namic range imaging: acquisition, display, and image-basedlighting . 2010. 1[82] Erik Reinhard, Michael Stark, Peter Shirley, and James Fer-werda. Photographic tone reproduction for digital images.

ACM Trans. Graph. , 21(3):267–276, 2002. 5[83] Mark A. Robertson, Sean Borman, and Robert L. Steven-son. Estimation-theoretic approach to dynamic range en-hancement using multiple exposures.

J Electronic Imaging ,12(2):219 – 228, 2003. 2[84] Raquel Gil Rodriguez and Marcelo Bertalmio. High qualityvideo in high dynamic range scenes from interlaced dual-ISO footage.

J Electronic Imaging , 2016(18):1–7, 2016.3[85] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net:Convolutional networks for biomedical image segmentation.In

MICCAI , pages 234–41, 2015. 5[86] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth.Discriminative non-blind deblurring. In

CVPR , pages 604–611, 2013. 2[87] C. J. Schuler, H. C. Burger, S. Harmeling, and B. Sch¨olkopf.A machine learning approach for non-blind image deconvo-lution. In

CVPR , pages 1067–1074, 2013. 2[88] Michael Sch¨oberl, Alexander Belz, Arne Nowak, J¨urgenSeiler, Andr´e Kaup, and Siegfried Foessel. Building a highdynamic range video sensor with spatially non-regular opti-cal ﬁltering.

Proc. SPIE , 8499, 2012. 2[89] M. Sch¨oberl, A. Belz, J. Seiler, S. Foessel, and A. Kaup.High dynamic range video by spatially non-regular opticalﬁltering. In

ICIP , pages 2757–2760, 2012. 2[90] Ulrich Seger, Uwe Apel, and Bernd H¨ofﬂinger. HDRC-imagers for natural visual perception.

Handbook of Com- uter Vision and Application , 1:223–235, 1999. 2[91] Ana Serrano, Felix Heide, Diego Gutierrez, Gordon Wet-zstein, and Belen Masia. Convolutional sparse coding forhigh dynamic range imaging. Comp. Graph. Forum , 35(2),2016. 2[92] Wenzhe Shi, Jose Caballero, Ferenc Husz´ar, Johannes Totz,Andrew P Aitken, Rob Bishop, Daniel Rueckert, and ZehanWang. Real-time single image and video super-resolutionusing an efﬁcient sub-pixel convolutional neural network. In

CVPR , pages 1874–1883, 2016. 5[93] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, andO. Wang. Deep video deblurring for hand-held cameras. In

CVPR , pages 237–246, 2017. 2, 3[94] J. Sun, Wenfei Cao, Zongben Xu, and J. Ponce. Learning aconvolutional neural network for non-uniform motion blurremoval. In

CVPR , pages 769–777, 2015. 2[95] Libin Sun, Sunghyun Cho, Jue Wang, and James Hays. Goodimage priors for non-blind deconvolution. In

ECCV , pages231–246, 2014. 2[96] Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, andJiaya Jia. Scale-recurrent network for deep image deblurring.In

CVPR , pages 8174–8182, 2018. 1, 2, 5[97] Michael D. Tocci, Chris Kiser, Nora Tocci, and PradeepSen. A versatile HDR video production system.

ACM Trans.Graph. , 30(4), 2011. 2[98] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky.Deep image prior. In

CVPR , 2018. 2[99] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero PSimoncelli. Image quality assessment: from error visibilityto structural similarity.

IEEE TIP , 13(4):600–612, 2004. 5[100] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniformdeblurring for shaken images. In

CVPR , pages 491–498,2010. 2[101] P. Wieschollek, M. Hirsch, B. Sch¨olkopf, and H. Lensch.Learning blind motion deblurring. In

ICCV , pages 231–240,2017. 2[102] Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, and Chi-Keung Tang.Deep high dynamic range imaging with large foregroundmotions. In

ECCV , pages 120–135, 2018. 2[103] Xiaohe Wu, Ming Liu, Yue Cao, Dongwei Ren, and Wang-meng Zuo. Unpaired learning of deep image denoising. In

ECCV , pages 352–368. Springer, 2020. 2[104] Jun Xu, Yuan Huang, Li Liu, Fan Zhu, Xingsong Hou,and Ling Shao. Noisy-as-clean: Learning unsuperviseddenoising from the corrupted image. arXiv preprintarXiv:1906.06878 , 2019. 2[105] Li Xu and Jiaya Jia. Two-phase kernel estimation for robustmotion deblurring. In

ECCV 2010 , pages 157–170, 2010. 2[106] Li Xu, Jimmy SJ Ren, Ce Liu, and Jiaya Jia. Deep convo-lutional neural network for image deconvolution. In

NiPS ,pages 1790–1798, 2014. 2[107] Lu Yuan, Jian Sun, Long Quan, and Heung-Yeung Shum.Image deblurring with blurred/noisy image pairs.

ACMTrans. Graph. , 26(3):1–es, 2007. 2[108] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, andLei Zhang. Beyond a gaussian denoiser: Residual learningof deep CNN for image denoising.

IEEE TIP , 26(7):3142–3155, 2017. 2[109] Kai Zhang, Wangmeng Zuo, and Lei Zhang. FFDNet: To-ward a fast and ﬂexible solution for CNN-based image de- noising.

IEEE TIP , 27(9):4608–4622, 2018. 1, 2, 5[110] Xinyi Zhang, Hang Dong, Zhe Hu, Wei Sheng Lai, Fei Wang,and Ming Hsuan Yang. Gated fusion network for joint imagedeblurring and super-resolution. In

BMVC , 2019. 2[111] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Lossfunctions for image restoration with neural networks.

IEEETIP , 3(1):47–57, 2016. 5[112] Zhihang Zhong, Gao Ye, Yinqiang Zheng, and Zheng Bo.Efﬁcient spatio-temporal recurrent neural network for videodeblurring. In

ECCV , 2020. 2[113] Shangchen Zhou, Jiawei Zhang, Jinshan Pan, Haozhe Xie,Wangmeng Zuo, and Jimmy Ren. Spatio-temporal ﬁlteradaptive network for video deblurring. In

ICCV , 2019. 2[114] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei AEfros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In

ICCV , pages 2223–32,2017. 2[115] D. Zoran and Y. Weiss. From learning models of naturalimage patches to whole image restoration. In

ICCV , pages479–486, 2011. 2[116] U. C¸ o˘galan and A. O. Aky¨uz. Deep joint deinterlacingand denoising for single shot dual-ISO HDR reconstruction.

IEEE TIP , 29:7511–7524, 2020. 3, 29:7511–7524, 2020. 3