[PDF] Back-Projection Pipeline

Abstract

We propose a simple extension of residual networks that works simultaneously in multiple resolutions. Our network design is inspired by the iterative back-projection algorithm but seeks the more difficult task of learning how to enhance images. Compared to similar approaches, we propose a novel solution to make back-projections run in multiple resolutions by using a data pipeline workflow. Features are updated at multiple scales in each layer of the network. The update dynamic through these layers includes interactions between different resolutions in a way that is causal in scale, and it is represented by a system of ODEs, as opposed to a single ODE in the case of ResNets. The system can be used as a generic multi-resolution approach to enhance images. We test it on several challenging tasks with special focus on super-resolution and raindrop removal. Our results are competitive with state-of-the-arts and show a strong ability of our system to learn both global and local image features.

Full PDF

BBack–Projection Pipeline

Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang, BOE Technology Co., Ltd.

Abstract

We propose a simple extension of residual networks thatworks simultaneously in multiple resolutions. Our networkdesign is inspired by the iterative back–projection algorithmbut seeks the more difﬁcult task of learning how to enhanceimages. Compared to similar approaches, we propose a novelsolution to make back–projections run in multiple resolu-tions by using a data pipeline workﬂow. Features are updatedat multiple scales in each layer of the network. The updatedynamic through these layers includes interactions betweendifferent resolutions in a way that is causal in scale, and itis represented by a system of ODEs, as opposed to a sin-gle ODE in the case of ResNets. The system can be usedas a generic multi–resolution approach to enhance images.We test it on several challenging tasks with special focus onsuper–resolution and raindrop removal. Our results are com-petitive with state–of–the–arts and show a strong ability ofour system to learn both global and local image features.

Introduction

Image enhancement is the process of taking an impaired im-age as input and return an image of better quality. The cur-rent trend to achieve this target is to learn a mapping betweenimpaired and enhanced images using example data. Deep–learning is leading this fast–growing quest in a numberof applications, including: denoise(Lefkimmiatis 2018), de-blur(Tao et al. 2018), super–resolution(Timofte et al. 2018),demosaicking(Kokkinos and Lefkimmiatis 2018), compres-sion removal(Lu et al. 2018), dehaze(Ancuti et al. 2018b),derain(Wang et al. 2019), raindrop removal(Qian et al.2018a), HDR(Wu et al. 2018), and colorization(He et al.2018). Progress in network architectures often succeeds inimage enhancement, as seen for example in image super–resolution, with CNNs applied in SRCNN (Dong et al.2014), ResNets (He et al. 2016) applied in EDSR (Limet al. 2017), DenseNets (Huang et al. 2017) applied in RDN(Zhang et al. 2018d), attention (Hu, Shen, and Sun 2018) ap-plied in RCAN (Zhang et al. 2018a), and non–local attention(Wang et al. 2018) applied in RNAN (Zhang et al. 2019). Inall these examples, arguably the most inﬂuential practice isthe use of residual networks (ResNets). Here, we deﬁne the network state as the internal representation of an image ina network, commonly referred to as latent or feature spacein the literature. The idea of ResNets is to represent an im-paired image as a network state and progressively change it + + + ++ + + ++ + + +

STEP 0 1 2 3 4 + + + +

STEP 0 1 2 3 4

Initial ConditionsInitialCondition

HIGHRESLOWRES

ResNetBPP

COMPOSITIONALITY:Information travels in depthSCALE CAUSALITY:Information travelsfrom LR to HR

Figure 1: Our system (BPP) works as a multi–scale ResNetwith state updates that interact with lower resolution states.Information travels forward in depth and upwards in scale.by adding residuals, as seen in Figure 1. This gives a com-positional hierarchy(Poggio et al. 2017) of progressive localprocessing steps (e.g. convolutional layers) that transformsthe input image. The update strategy of residual networkscan be seen as a dynamical system where depth representstime and a differential equation models the evolution of thestate(Liao and Poggio 2016).Our proposed system, a

Back–Projection Pipeline(BPP) , works as a residual network that carries many (in-stead of one) resolution states at a given time step as seen inFigure 1. Although similar in spirit to U–Nets (Ronneberger,Fischer, and Brox 2015), this multi–resolution state is fun-damentally different. U–Nets hold high resolution states tore–enter the network in later stages, whereas in BPP the stateis created as initial conditions in multiple resolutions and getupdated synchronously through the network. Another dis-tinctive property of BPPs is scale causality . Namely, afterinitialization, low resolution states do not depend on higherresolution states. Information travels forward in depth, sameas in ResNets, and upwards in scale, as shown in Figure 1.Scale causality is inspired by scale–space (Lindeberg 1994)and multi–resolution analysis(Mallat 1998) to express thenested nature of details. A simple example is that when wesee an image of a keyboard we expect to see letters, but notnecessarily the other way around. Finally, the interpretation a r X i v : . [ ee ss . I V ] J a n f BPPs as an extension of ResNets becomes more clearfrom the dynamic of the network. We will show that BPPupdates can be modeled by a non–autonomous system of dif-ferential equations, as opposed to a single ODE for ResNets. Related Work . With regard to applications, BPP givesus a generic multi–resolution approach to transform imagesinto a desired target. Current benchmarks in image enhance-ment often use different architectures for different tasks. Itis important to distinguish between local and global targets.In the problem of super–resolution, for example, we need tocalculate pixel values around a local area, and distant pixelsbecome less relevant. In a different problem, contrast en-hancement, we want to change the histogram of an image,which contains statistics that represent global features. Gen-eral image enhancement is gaining interest in research andhas been considered in the context of:•

Mixed Local Problems : In (Zhang et al. 2019), for ex-ample, authors solve denoising, super–resolution and de-blur tasks using a single architecture and different pa-rameters for each problem. In (Gharbi et al. 2016; Ehretet al. 2019) authors solve joint demosaicking and de-noising, and in (Qian et al. 2019) authors additionallysolve super-resolution, all through using a single archi-tecture and same model parameters. In (Zhang, Zuo, andZhang 2018) authors tackle super–resolution and deblur,and train a single system to handle different image degra-dations.•

Global and Local Problems : Authors in (Soh, Park, andCho 2019; Kim, Oh, and Kim 2019; Kinoshita and Kiya2019) consider the joint solution of low–to–high dynamicrange enhancement as well as image–SR. In (Kim, Oh,and Kim 2019) authors generate an image in HDR displayformat, whereas (Soh, Park, and Cho 2019; Kinoshita andKiya 2019) use the same input and output format. Theyboth use U–Net conﬁgurations, while (Soh, Park, and Cho2019) uses a two–stage Retinex decomposition network.Regarding architecture, BPP uses a multi–resolutionworkﬂow, which is different from U–Nets (Ronneberger,Fischer, and Brox 2015). This workﬂow follows from theIterative Back–Projection (Irani and Peleg 1991) (IBP) algo-rithm. In this respect, Multi–Grid Back–Projection (Navar-rete Michelini, Liu, and Zhu 2019) (MGBP) is the clos-est super–resolution system that is state–of–the–arts forlightweight systems with small number of parameters. It isbased on a multi–resolution back–projection algorithm thatuses a multigrid recursion(Trottenberg and Schuller 2001).This recursion violates scale–causality as it sends networkstates back to low–resolution to restart iterations. We alsonotice that BPP follows the wide–activation design in (Yuet al. 2018) in the sense that features are increase before acti-vations and reduced before updating. BPP shows a workﬂowstructure similar to the Multi–scale DenseNet architecturein (Huang et al. 2018), except in the latter scale–causalitymoves downwards in scale, it does not use back–projectionsand it focuses on a label prediction problem. The WaveNetarchitecture (Oord et al. 2016) also shares the property ofscale causality but without back–projections, moving infor-mation upwards in scale without any step back. Finally, a - + - + - ++ - + + - + + - ++ + + + - + - + - ++ ++ - + - ++ . . . . . .. . . Symbology

Lower Resolu � onreference imagehas changed! Lowest Resolu � onimage never changes CLASSIC BACK-PROJECTIONPIPELINING BACK-PROJECTIONS The FLUX Unit(core computa � onal block) - ++ ++ - + - + Figure 2: Pipelining Iterative Back–Projections.similar causality and adaptation in the number of channelsper scale exists in the SlowFast architecture (Feichtenhoferet al. 2019) but again without back–projections.

Contributions . Our major contribution is the introductionof a new network architecture that extends ResNets fromsingle to multiple resolutions, with a clear representation interms of ODE dynamic. Our main focus is to evaluate thisextension and to prove that it is beneﬁcial with respect toconventional ResNets. We also verify that the multi–scaledynamic of the network is being used to achieve improvedperformance and we visualize the dynamic of the network insolving different problems. BPP can be used to solve jointlocal problems, as well as combinations of global and localproblems, getting state–of–art results in image–SR and com-petitive results for other problems using a single networkconﬁguration. Finally, we also show empirical evidence thatBPP effectively uses both local and global information tosolve problems.

Architecture Design

In Figure 2 (a) we observe the workﬂow of the IterativeBack–Projections (Irani and Peleg 1991) (IBP) algorithm: h = P x , h t +1 = h t + P e ( h t ) ,e ( h t ) = x − R h t . (1)IBP upscales an image x with a linear operator P and sendsit back to low–resolution to verify the downscaling modelrepresented by a linear operator R . Now, we propose to ex-tend the IBP algorithm to multiple scales by using the datapipeline approach shown in Figure 2 (b). Speciﬁcally, assoon as we get the ﬁrst upscale image, we take it as refer-ence and start a new upscaling to a higher resolution. Next,we downscale the second high–resolution image to verifythe downscaling model. However, the reference image hasbeen changed by the back–projection update at the lowerlevel. At the lowest resolution the image never changes, andupper level iterations need to keep track of the lower levelupdates. In Figure 2 (b) we identify the essential computa-tional block to assemble the pipeline: the Flux unit. The

Flux unit is what makes scale travel possible by connecting inputand output images from different levels.

Network Architecture . Without loss of generality, wetackle the image enhancement problem with an input res-

ETWORK FEATURESRGB IMAGECONCATENATE c + c FLUX UNIT + ccc ++ ++ + + ccc ++ ++ + c . . . . . .. . . cc OUTPUTIMAGE + + . . . . . .. . . DETAIL DIAGRAM DIAGRAM USING FLUX UNITS I N S T A N C E N O R M C O N V O L U T I O N T R A N S P O S E D S t r i d e = ReLU I N S T A N C E N O R M C O N V O L U T I O N S t r i d e = ReLU C O N V O L U T I O N C O N V O L U T I O N c c C O N V O L U T I O N T R A N S P O S E D S t r i d e = OUTPUTIMAGE

SCALERUPSCALE DOWNSCALE SYNTHESISANALYSIS

INPUTIMAGE c INPUTIMAGE

Figure 3: Back–Projection Pipeline network diagram. On the left, a detailed diagram shows all back–projection modules. Onthe right, the diagram is simpliﬁed by using

Flux units.

Algorithm 1

Back–Projection Pipeline (BPP)

BPP ( input, L, D ) : FluxBlock ( x k , p k , L ) : Flux ( e in , x in , p in ) : Input:

Image input . Input:

Integer L (cid:62) , D (cid:62) . Output:

Image output . s AL = input for k = L − , . . . , do s Ak = Scaler Ak ( s Ak +1 ) end for x L = Analysis Ak ( s AL ) for k = 1 , . . . , L − do x k = Analysis Ak ( s Ak ) s Bk = Scaler Bk ( s Ak +1 ) p k = Analysis Bk ( s Bk ) end for for l = 1 , . . . , D do x, p = F luxBlock ( x, p, L ) end for output = input + Synthesis ( x L ) Input:

Initial x k , p k , k = 1 , . . . , L . Input:

Integer L (cid:62) . Output:

Updated x k , p k , k = 1 , . . . , L . e , x , = F lux (0 , x , p ) for k = 2 , . . . , L do e k +1 , x k , p k − = F lux ( e k , x k , p k ) end for , x L , p L − = F lux ( e L , x L , INCREASEDRESOLUTIONLESS FEATURES

Input: e in , x in , p in . Output: e out , x out , p out . c = x in + e in e out = Upscale ([ p in , c ]) p out = Downscale ( c ) x out = Update ( c ) + c olution equal to the output resolution. In the case of imageSR, which requires to increase image resolution, we add apre–processing stage where the input image is upscaled us-ing a standard method (e.g. bicubic). This helps to makethe system become more general for applications. For ex-ample, we can easily solve the problem of fractional up-scaling factors(Hu et al. 2019) or multiple upscaling fac-tors(Zhang, Zuo, and Zhang 2018) by simply using differentpre–processing bicubic upscalers.The full BPP algorithm and network conﬁguration isspeciﬁed in Algorithm 1 and Figure 3. To extend the pipelin-ing approach into a network conﬁguration, ﬁrst, we initializethe network states x k and down–projections p k using lineardownscalers and single convolutional layers in the Analysis modules to increase the number of channels. Second, statesare updated using the

Flux–Blocks deﬁned in Algorithm 1,calculating residuals e k and updating states upwards in scalewith ﬂux units. Third, the output state in the highest resolu-tion is converted into a residual image by a convolutionallayer in the Synthesis module and added to the input image.

Network Dynamic . The restriction operators R k ( Down-scale module) and interpolation operators P k ( Upscale mod-ule) are now non–linear and do not share parameters (timedependent). When we interpret depth as time t , the dynamicis described in Figure 4 and leads to the following set of difference equations with their correspondent extension tocontinuous time: h t +1 k = h tk + P k ( R k ( h tk , t ) , h t +1 k − , t ) h t +11 = h t , cont.time ⇒ dh k dt = P k ( R k ( h k , t ) , h k − , t ) h ( x, y, t ) = h ( x, y, . (2)In the case of ResNets, the dynamical systems is given by h t +1 = h t + f ( h t , t ) and dhdt = f ( h, t ) in continuous time.Therefore, BPP extends the model of ResNets from a singleODE to a system of coupled equations. Scale–causality fol-lows from (2) as state h k only depends on h k − , h k − , . . . .The multi–scale nature follows from the spatial dimensionof state vectors h k , explicitly expressed in operators P k : R H × W → R H × W and R k : R H × W → R H × W . In con-tinuous space we could also express the multi–scale natureof the equations by using initial conditions h k − s ( x, y, t =0) = h k (2 s x, s y, t = 0) with s ∈ N with no ﬁlter-ing needed in continuous space, since aliasing effects donot exist. We observe that initial conditions are self-similarin scale(Mallat 1998). Whether this property is maintainedin time depends on the evolution of the network state. In + + Figure 4: State diagram of the depth transitions in theBPP architecture. The residual structure leads to a non–autonomous system of differential equations.the continuous time model, the restriction operator R k in(2) represents a renormalization–group transformation of thenetwork state, similar to those used in particle physics andODEs to ensure self–similarity(Fisher 1974; Chen, Golden-feld, and Oono 1996). In this sense, using different parame-ters at each scale allows the model to adjust the level of self–similarity that works better for a given problem. On the otherhand, using different parameters in time can also be bene-ﬁcial. It has been observed in (Liao and Poggio 2016) thatnormalization layers do not work well in recurrent networks,which share parameters in time. But in time–dependent sys-tems, these layers become beneﬁcial. Since the BPP conﬁg-urations in our experiments use IN–layers, we chose to usedifferent parameters in time. This does not have a signiﬁ-cant effect in performance, because the ﬂux–block structurein Algorithm 1 uses inline updates that avoid storage of oldnetwork states. During training, a checkpoint strategy caneffectively reduce the memory footprint (Chen et al. 2016).Using pipelining to extend IBP into multiple scales is sim-ple and this is the major strength of this approach. Thereare several ways to extend IBP to multiple scales. We men-tioned MGBP as a relevant but different approach. BPP issimpler, and that simplicity translates to a clear ODE modelthat is difﬁcult to obtain otherwise. Most importantly, thisODE model is very expressive about the connection to IBP.It is direct from (2) that if the composition of P and R opera-tions forms a contraction mapping then the ODE model willconverge, which is the same argument used in convergenceproofs of IBP in the linear case (Irani and Peleg 1991). Atthis point BPP departs from IBP. Because BPP is trained ina supervised fashion, we do not know a priori how is thisdynamic going to be driven towards the target. Overall, theBPP model inherits the essence of IBP in terms of an it-eration that updates residuals upwards in scale, which cannow be trained to reach diverse targets in a non–linear fash-ion using convolutional networks. The main purpose of ourinvestigation is: ﬁrst, to generalize the IBP dynamic to mul-tiple scales in sequence; and second, to study how powerfulis this dynamic so solve more general problems.Finally, we note that the continuous model in (2) allowsBPP to work as a Neural–ODE system(Chen et al. 2018).For the sake of simplicity, in this work we do not explorethis direction. However, it stands as an interesting directionfor future research. Experiments

In our experiments we found that using IN–layers to activateReLU units, as shown in Figure 3, could help converge fasterin early training and doing so independent of initialization.Figure 5 (b) shows this effect and we also see that IN–layersare not required for BPP in the long run. In early trainingIN layers placed before ReLUs force a activation in allﬂux units across all scales. This strategy shows to be a goodchoice to initialize parameters. Alternatively, we found thatthe most effective way to avoid IN–layers is using Dirac–kernels to initialize weights and adding Gaussian noise. Thisinitialization was used in the learning curve

BPP (no IN ) inFigure 5 (b) and it is the closest we have found to avoidnormalization layers.Because of memory limitations we used a patch–basedtraining strategy, where smaller–sized patches are takenfrom training set images. Patch–based learning reduces thereceptive ﬁeld of the network during training. At inferencethe performance of the network reduces if the mean and vari-ance of IN–layers are computed on an image larger than thetraining patches. To solve this problems we: ﬁrst, divide in-put images into overlapping patches (of same size as trainingpatches); second, we multiply each output by a Hammingwindow(Harris 1978); and third, we average the results. Inall our experiments we use overlapping patches separated by pixels in vertical and horizontal directions. The weightedaverage helps to avoid blocking artifacts.On one hand, this approach introduces redundancy andreduces performance for medium size images. On the otherhand, it also allows the algorithm to run on very large im-ages (e.g. 8K) and can be massively parallelized by batchprocessing in multiple GPUs. Conﬁguration . In the following experiments we use aBPP conﬁguration with back–projection layers (ﬂux–blocks), resolution levels and , , and featuresper level from lowest to highest resolution, respectively. Allconvolutional layers use × as kernel size, and scalers areinitialized with bicubic ﬁlters of size × and trained asadditional parameters. A fully unrolled diagram is shown inthe Appendix. The conﬁguration was tuned according to val-idation performance for the most challanging problems (e.g.SR– × ). By ﬁxing the conﬁguration we can potentially havethe architecture hardwired in silicon and update its modelparameters to switch between different problems. Performance . The BPP architecture is multi–scale andsequential. The so–called

Flux–Block in Algorithm 1 rep-resents the sequential block and consists of one Flux unitper level. This sequential structure is more convenient formemory performance as it avoids buffering of features fromprevious blocks. Architectures such as Dense–Nets, U–Netsand MGBP need to buffer features in skipped connectionsand thus need more memory. Because the conﬁguration isﬁxed, the performance of the system can be roughly esti-mated from average statistics. The system has a total of million parameters and it can process . million pixels persecond on a Titan X GPU using –bit ﬂoating point preci-sion. This means, for example, that it takes . seconds toprocess a Full–HD image in RGB format ( × × pixels). ICUBIC BPP-x2x3x4x8 BPP EDSR DBPN GROUND TRUTHMSLapSR

Figure 5: a) Qualitative evaluation for SR methods. b) Validation MSE for × SR.Table 1: Quantitative evaluation for SR. A more extensive comparison is available in the Appendix.

Set14 BSDS100 Urban100 Manga109Algorithm PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M Bicubic × BPP–SRx2x3x4x8 33.27 0.913 31.21 0.879 31.67 0.921 38.31 0.975BPP–SRx2 34.23 0.922 31.63 0.886 33.07 0.935 39.19 0.977Bicubic × BPP–SRx2x3x4x8 30.23 0.838 28.81 0.794 28.43 0.852 33.75 0.943BPP–SRx3 × BPP–SRx2x3x4x8 28.55 0.778 27.43 0.728 26.48 0.791 30.81 0.909BPP–SRx4 × BPP–SRx2x3x4x8 25.10 0.642 24.89 0.598 22.72 0.626 24.78 0.785BPP–SRx8

P1: Image Super–Resolution . We use DIV2K (Agusts-son and Timofte 2017) and FLICKR–2K datasets for train-ing and the following datasets for test: Set–14(Zeyde,Elad, and Protter 2010), BSDS–100(Martin et al. 2001),Urban–100(Huang, Singh, and Ahuja 2015) and Manga–109(Matsui et al. 2017). Impaired images were obtained bydownscaling and then upscaling ground truth images, us-ing Bicubic scaler, and scaling factors: × , × , × and × . Here, we consider two cases: we trained models BPP–SR × f for each upscaling factor f = 2 , , and ; and wealso trained a single model BPP–SRx2x3x4x8 to restore im-paired images with unknown upscaling factors. We use patches per mini–batch with patch size f × f for knownupscaling factor f , and × for unknown upscalingfactor, all at high resolution.Table 1 and Figure 5 (a) show quantitative and qualita-tive results compared to other methods. We focus our com-parison to the following methods: Bicubic (the baseline);EDSR (Lim et al. 2017), with major processing in reso-lution level using a –layer ResNet; Dense–DBPN (Haris,Shakhnarovich, and Ukita 2018), with major processing in resolution levels using densely connected up/down back– projections; and RDN (Zhang et al. 2018c), with major pro-cessing in resolution level using densely connectedresidual–dense–blocks. We show EDSR and DBPN becausethey are both closely related to BPP in their residual andback–projection structures, respectively, and we show RDNas a top reference of current state–of–the–arts. Further com-parisons with other methods can be found in the Appendix.Overall, for the problem of super–resolution we ﬁnd thatBPP can get excellent results, reaching state–of–the–arts re-sults in both quantitative and qualitative evaluations, butit decreases its performance when we test a more generalproblem. First, BPP– × f models get the best scores in mostquantitative and qualitative evaluations, with RDN slightlyoutperforming BPP at × and × upscaling factors. Thissetting, including datasets for training and test, is the mostcommon evaluation procedure for supervised SR technics.In terms of application this would be useful if we need toenhance an image upscaled with Bicubic upscaler with aspeciﬁc upscaling factor. It often happens that we have animage upscaled with an unknown factor and in this casewe do not know which model parameters to load. In thiscase the BPP–SRx2x3x4x8 model offers a general upscalingable 2: Quantitative results of raindrop removal. Method PSNR– Y P SSIM– Y P Eigen13 28.59 0.6726Pix2Pix 30.14 0.8299DeRaindrop (No GAN) 29.25 0.7853DeRaindrop solution. This performance of these BPP models decreaseand not reach state–of–the–arts results. Although reasonablyclose to state–of–the–arts, often outperforming EDSR, wewould have expected this model to perform better than BPP– × f if the architecture was able to generalize effectively tothis more general setting. In fact, it has been observed inVDSR (Kim, Lee, and Lee 2016a) and MDSR (Lim et al.2017) that training with unkown upscaling factors can im-prove the performance of the network. Therefore, these em-pirical results show that BPP can be very effective for ﬁxedupscaling factors but does not generalize as well as otherarchitectures for general upscaling factors. P2: Raindrop Removal . We use the DeRaindrop(Qianet al. 2018a,b) dataset for training and test. This datasetprovides paired images, one degraded by raindrops and theother one free from raindrops. These were obtained by usetwo pieces of exactly the same glass: one sprayed with water,and the other is left clean. In each training batch, we take patch of size × . We train a BPP model using L lossand patch size × . More details of training settingsare provided in the Appendix.This problem is very different in nature to super–resolution. On one hand, a signiﬁcant portion of pixels con-tain (uncorrupted) high–resolution information that mustmove to the output with little or no change. At the sametime it needs to identify the irregular distribution of rain-drops, with different sizes, and ﬁll–in those areas by predict-ing the content within. In some images the content withinraindrops is of little use, making the problem similar to in-painting. Thus, the problem requires processing of both localand global information in order to ﬁll–in raindrops.Even though we only trained our system with an L loss,our system performs similar to the state–of–the–arts De-Raindrop (Qian et al. 2018a) as seen in Table 2 and Figure6. The DeRaindrop network in (Qian et al. 2018a) uses anattentive GAN approach that can estimate raindrop masksto focus on these areas for restoration. The PSNR score ofBPP is better than DeRaindrop without adversarial training,and the SSIM score is better than all other systems in Ta-ble 2. The qualitative evaluation shows that BPP achieves areasonable quality, considering the fact that it has not beentrained using GANs. Here, the BPP architecture appears tobe effective. In the next section we inspect properties of thenetwork that reveal the undergoing mechanism used by BPPto obtain its solutions. Other Problems . The performance in other problems,including mobile–to–DSLR photo translation, dehaze andjoint HDR+SR are included in the Appendix.

Inspection of ODE updates . We conduct experiments tomeasure the magnitude of the updates in equation (2) to bet- Figure 6: Qualitative evaluation for raindrop removal.

SR 2×SR 3×SR 4×SR 8×RainDrop

Figure 7: Average L –magnitudes of residual updates nor-malized by the maximum update with ﬁxed value .ter understand the dynamic of the network when solving dif-ferent problems. The arrange of Flux units in BPP networksforms an array of size L × D (number of levels times depth)and we compare the magnitude of residual updates in eachone of these units. In Figure 7 we display the result of mea-suring (cid:13)(cid:13)(cid:13)(cid:13) dh k dt (cid:13)(cid:13)(cid:13)(cid:13) = (cid:107) P k ( R k ( h tk , t ) , h t +1 k − , t ) (cid:107) , (3)for every ﬂux unit, averaged over all images in the validationsets, and normalized to the maximum value (ﬁxed to ).At the lowest resolution ( k = 4 ) the reference image neverchanges and thus the updates is always zero.Interestingly, we observe that the dynamic is far from theoriginal contraction mapping design of IBP, that would re-sult in an exponential decay of updates along depth. Here,we should remember that the dynamic is driven exclusivelyby the result of training the network in supervised manner. PP-SRx4EDSR-x4BPP-Raindrop

Figure 8: Local and global contributions (

F x and r ) for systems using deep ﬁlter visualization(Navarrete Miche-lini, Liu, and Zhu 2019). EDSR relies on local contributionswhile BPP balances both local and global contributions.Instead of an exponential decay, the network consistentlyshows a bimodal statistic with one peak very close to theinput and another very close to the output. Also, the highestresolution receives very small updates meaning that thesefeature move more or less unchanged with an increased up-date towards the end. The major processing goes on at themiddle levels. In SR updates are stronger at the lower resolu-tion ( k = 3 ) and for RainDrop removal updates are strongerat the higher resolution ( k = 2 ). The bimodal statistic isreminiscent of interpretability results for VGG networks inclassiﬁcation, that show higher contribution to label outputsvery early and very late in a sequential conﬁguration (Navar-rete Michelini et al. 2019). Nevertheless, in BPP the updatesfocus on one or two resolution levels as opposed to VGGnetworks that are designed to process high resolutions earlyin the network and very low resolutions towards the end.Despite this important difference, these results suggest thatsequential networks ﬁnd solutions in two steps: analysis atthe ﬁrst layers, and fusion towards the very end. Interpretability . We apply the

LinearScope method from(Navarrete Michelini et al. 2019) to analyze the learning pro-cess in global and local problems. The general methodologyis as follows. The BPP architecture contains several non–linear modules consisting on ReLUs and IN-layers. The de-cision of which pixels pass or stop in ReLUs, and what meanand variance to use in IN–layers, is non–linear. But the ac-tion of these layers are linear: masking and normalizing. For a given input image x , the action of all non–linear modules(ReLU and IN–layers) can be ﬁxed as: / –masks for ReLUand ﬁxed mean and variance in IN–layers. This gives a linearsystem of the form y = F x + r that generates the same out-put as the non–linear system for the input x , and representsthe overall action of the network on the input image.The matrix F represents the interpolation ﬁlters used bythe network to solve the problem, and thus shows the localprocessing . The residual r is a ﬁxed global image createdby non–linear modules. Figure 8 shows the local contribu-tions, F x , and global contributions, r , for three systems. Weobserve that EDSR almost purely relies on local processingto obtain an output. BPP, on the other hand, relies mostlyon local processing but the contribution of r is much largerthan the one in EDSR. This shows a signiﬁcantly differentapproach followed by BPP, compared to EDSR, to solve thesuper–resolution problem.The BPP system for raindrop removal reveals a muchlarger contribution of r , that resembles a mask of raindrops.This means that BPP uses a local approach on areas withoutraindrops (using F x ) and a global approach on raindropsdetermined by the residual r . The mechanism used by thenetwork to obtain the residual r is non–linear. Overall, weobserve that for this problem the BPP network divides theproblem in two parts: a local adaptive ﬁlter in clean areas, tonearly copy–paste the input into the output; and a non–linearglobal approach to ﬁll–in raindrop areas. Conclusions

We propose Back–Projection Pipeline as a simple yet non–trivial extension of residual networks (ResNets) to run inmultiple resolutions. The update dynamic through the lay-ers of the network includes interactions between differentresolutions in a way that is causal in scale, and it is repre-sented by a system of ODEs. We use it as a generic multi–resolution approach to enhance images. The focus of our in-vestigation is to evaluate this multi–scale residual approach.Overall, our empirical results show that BPP can achieve ex-cellent results in traditional supervised learning. Our BPPconﬁguration gets state–of–the–art results in SR for ﬁxedupscaling factors and competitive results for raindrop re-moval as well as other problems (see Appendix). We alsoobserve a lack of generalization for the problem of SR withunknown upscaling factors. Inspection of the residual up-dates in the network shows that all resolution levels are be-ing used, with higher intensity in lower resolutions, showingthat supervised training gives preference to the multi–scalesetting over traditional residual networks. Based on our re-sults, we cannot conclude that scale causality is beneﬁcial.Nevertheless, we can at least conclude that this strong sim-pliﬁcation in the ﬂow of network information, inherited fromIBP, does not prevent the architecture to achieve competi-tive results. Further investigation is necessary in this regard(especially regarding generalization) and it could open inter-esting research directions in network architecture search anddesign.

ETWORK FEATURESRGB IMAGECONCATENATE c + ccc cc c + ++ + ++ + + + ccc ++ ++ + cc OUTPUTIMAGE ++ ccc cc c + ++ + ++ + + ccc + ccc cc c + ++ + ++ + + ccc + ccc cc c + ++ + ++ + + ccc c + cccc ++ ++ + ccc c C O N V O L U T I O N C O N V O L U T I O N C O N V O L U T I O N T R A N S P O S E D S t r i d e = SCALER I N S T A N C E N O R M C O N V O L U T I O N T R A N S P O S E D S t r i d e = ReLU

UPSCALE I N S T A N C E N O R M C O N V O L U T I O N S t r i d e = ReLU

DOWNSCALESYNTHESISANALYSIS

INPUTIMAGE c Figure 9: Detail diagram of the –level, –layers BPP conﬁguration used in our experiments. References

Agustsson, E.; and Timofte, R. 2017. NTIRE 2017 Challenge onSingle Image Super-Resolution: Dataset and Study. In

The IEEEConference on Computer Vision and Pattern Recognition (CVPR)Workshops . 5, 11Ancuti, C.; Ancuti, C. O.; and Timofte, R. 2019. NTIRE–2019Dehaze Evaluation code. https://competitions.codalab.org/my/datasets/download/a85cc0d2-cf8b-4ec8-bf83-243c7bcda515.[Online; accessed 20-May-2019]. 11Ancuti, C.; Ancuti, C. O.; Timofte, R.; and De Vleeschouwer, C.2018a. I-HAZE: a dehazing benchmark with real hazy and haze-free indoor images. In

International Conference on Advanced Con-cepts for Intelligent Vision Systems , 620–631. Springer. 11Ancuti, C.; Ancuti, C. O.; Timofte, R.; Van Gool, L.; Zhang, L.;and Yang, M. 2018b. NTIRE 2018 Challenge on Image Dehazing:Methods and Results 891–901. 1Ancuti, C. O.; Ancuti, C.; Sbert, M.; and Timofte, R. 2019. DenseHaze: A benchmark for image dehazing with dense-haze and haze-free images.

CoRR abs/1904.02904. URL http://arxiv.org/abs/1904.02904. 11Ancuti, C. O.; Ancuti, C.; Timofte, R.; and De Vleeschouwer, C.2018c. O-HAZE: a dehazing benchmark with real hazy and haze-free outdoor images. In

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Workshops , 754–762. 11Chen, L. Y.; Goldenfeld, N. D.; and Oono, Y. 1996. Renormal-ization group and singular perturbations: Multiple scales, boundarylayers, and reductive perturbation theory.

Physical Review E neural information pro-cessing systems arXiv: Learning . 4Dong, C.; Loy, C. C.; He, K.; and Tang, X. 2014. Learning a DeepConvolutional Network for Image Super–Resolution. In in Pro-ceedings of European Conference on Computer Vision (ECCV) . 1,12Dong, C.; Loy, C. C.; and Tang, X. 2016. Accelerating the Super–Resolution Convolutional Neural Network. In in Proceedings ofEuropean Conference on Computer Vision (ECCV) . 12Ehret, T.; Davy, A.; Arias, P.; and Facciolo, G. 2019. Joint demo-saicing and denoising by overﬁtting of bursts of raw images. 2Feichtenhofer, C.; Fan, H.; Malik, J.; and He, K. 2019. SlowFastNetworks for Video Recognition. In

The IEEE International Con-ference on Computer Vision (ICCV) . 2 Fisher, M. E. 1974. The renormalization group in the theory ofcritical behavior.

Reviews of Modern Physics

ACM Trans. Graph.

IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR) . 5, 12Harris, F. J. 1978. On the use of windows for harmonic analysiswith the discrete Fourier transform.

Proceedings of the IEEE computer vision and pattern recogni-tion

ACM Transactions on Graphics(TOG) computer vision and pattern recognition arXiv: Computer Vision and Pattern Recognition . 3Huang, G.; Chen, D.; Li, T.; Wu, F.; Der Maaten, L. V.; and Wein-berger, K. Q. 2018. Multi-Scale Dense Networks for Resource Ef-ﬁcient Image Classiﬁcation.

International Conference on LearningRepresentations . 2Huang, G.; Liu, Z.; van der Maaten, L.; and Weinberger, K. Q.2017. Densely connected convolutional networks. In

Proceedingsof the IEEE Conference on Computer Vision and Pattern Recogni-tion . 1, 10Huang, J.; Singh, A.; and Ahuja, N. 2015. Single image super-resolution from transformed self-exemplars 5197–5206. 5, 11Ignatov, A.; Kobyshev, N.; Timofte, R.; Vanhoey, K.; andVan Gool, L. 2017. DSLR-quality photos on mobile devices withdeep convolutional networks. In

Proceedings of the IEEE Interna-tional Conference on Computer Vision , 3277–3285. 11Irani, M.; and Peleg, S. 1991. Improving Resolution by ImageRegistration.

CVGIP: Graph. Models Image Process.

The IEEEConference on Computer Vision and Pattern Recognition . 6, 12im, J.; Lee, J. K.; and Lee, K. M. 2016b. Deeply-RecursiveConvolutional Network for Image Super–Resolution. In

The IEEEConference on Computer Vision and Pattern Recognition . 12Kim, S. Y.; Oh, J.; and Kim, M. 2019. Deep SR-ITM: Joint Learn-ing of Super-Resolution and Inverse Tone-Mapping for 4K UHDHDR Applications. In

The IEEE International Conference onComputer Vision (ICCV) . 2Kingma, D. P.; and Ba, J. 2015. Adam: A method for stochasticoptimization. international conference on learning representations . 11, 17Kinoshita, Y.; and Kiya, H. 2019. Convolutional Neural NetworksConsidering Local and Global features for Image Enhancement. In

The IEEE International Conference on Image Processing (ICIP) . 2Kokkinos, F.; and Lefkimmiatis, S. 2018. Deep image demosaick-ing using a cascade of convolutional residual denoising networks.In

Proceedings of the European Conference on Computer Vision(ECCV) , 303–319. 1Kundu, D.; Ghadiyaram, D.; Bovik, A. C.; and Evans, B. L. 2017a.Evaluation code for HIGRADE metric. http://live.ece.utexas.edu/research/Quality/higradeRelease.zip. [Online; accessed 20-May-2019]. 10, 11Kundu, D.; Ghadiyaram, D.; Bovik, A. C.; and Evans, B. L. 2017b.No-Reference Quality Assessment of Tone-Mapped HDR Pictures.

IEEE Transactions on Image Processing

IEEE Transactions on Pattern Analysis and Machine Intel-ligence

IEEE Conference on Computer Vision and PatternRecognition . 12Lefkimmiatis, S. 2018. Universal denoising networks: a novelCNN architecture for image denoising. In

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 3204–3213. 1Liao, Q.; and Poggio, T. 2016. Bridging the gaps between resid-ual learning, recurrent neural networks and visual cortex. arXivpreprint arXiv:1604.03640 . 1, 4Lim, B.; Son, S.; Kim, H.; Nah, S.; and Lee, K. M. 2017. EnhancedDeep Residual Networks for Single Image Super–Resolution. In

The IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshops . 1, 5, 6, 11, 12Lindeberg, T. 1994.

Scale-Space Theory in Computer Vision . ISBN0-7923-9418-6. doi:10.1007/978-1-4757-6465-9. 1Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Gao, Z.; and Sun, M.-T. 2018. Deep Kalman ﬁltering network for video compressionartifact reduction. In

Proceedings of the European Conference onComputer Vision (ECCV) , 568–584. 1Ma, C.; Yang, C.-Y.; Yang, X.; and Yang, M.-H. 2017. Learninga No-Reference Quality Metric for Single-Image Super-Rolution.

Computer Vision and Image Understanding

A Wavelet Tour of Signal Processing . AcademicPress. 1, 3 Martin, D. R.; Fowlkes, C. C.; Tal, D.; and Malik, J. 2001. Adatabase of human segmented natural images and its applicationto evaluating segmentation algorithms and measuring ecologicalstatistics 2: 416–423. 5, 11Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Ya-masaki, T.; and Aizawa, K. 2017. Sketch-based manga retrieval us-ing manga109 dataset.

Multimedia Tools and Applications

TheIEEE International Conference on Computer Vision (ICCV) . IEEE.URL https://arxiv.org/abs/1908.05168. 7Navarrete Michelini, P.; Liu, H.; and Zhu, D. 2019. Multigrid Back-projection Super–Resolution and Deep Filter Visualization. In

Pro-ceedings of the Thirty–Third AAAI Conference on Artiﬁcial Intelli-gence (AAAI 2019) . AAAI. 2, 7, 12Nemoto, H.; Korshunov, P.; Hanhart, P.; and Ebrahimi, T. 2015.Visual attention in LDR and HDR images. Technical report. URLhttps://mmspg.epﬂ.ch/downloads/hdr-eye/. 11Oord, A. v. d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.;Graves, A.; Kalchbrenner, N.; Senior, A.; and Kavukcuoglu, K.2016. Wavenet: A generative model for raw audio. arXiv preprintarXiv:1609.03499 . 2Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito,Z.; Lin, Z.; Desmaison, A.; Antiga, L.; and Lerer, A. 2017. Auto-matic differentiation in PyTorch. In

NIPS-W . 17Poggio, T.; Mhaskar, H.; Rosasco, L.; Miranda, B.; and Liao, Q.2017. Why and when can deep-but not shallow-networks avoidthe curse of dimensionality: a review.

International Journal of Au-tomation and Computing arXiv e-prints arXiv:1905.02538. 2Qian, R.; Tan, R. T.; Yang, W.; Su, J.; and Liu, J. 2018a. Atten-tive Generative Adversarial Network for Raindrop Removal Froma Single Image. In

The IEEE Conference on Computer Vision andPattern Recognition (CVPR) . 1, 6, 17Qian, R.; Tan, R. T.; Yang, W.; Su, J.; and Liu, J.2018b. De-Raindrop dataset. https://drive.google.com/open?id=1e7R76s6vwUJxILOcAsthgDLPSnOrQ49K. [Online; accessed20-May-2019]. 6, 17Qian, R.; Tan, R. T.; Yang, W.; Su, J.; and Liu, J. 2018c. Eval-uation code for Raindrop removal. https://github.com/rui1996/DeRaindrop/blob/master/metrics.py. [Online; accessed 20-May-2019]. 11Reinhard, E.; and Devlin, K. 2005. Dynamic range reduction in-spired by photoreceptor physiology.

IEEE Transactions on Visual-ization and Computer Graphics medicalimage computing and computer assisted intervention arXiv:1905.00933 [cs, eess]

URL http://arxiv.org/abs/1905.00933.ArXiv: 1905.00933. 2, 11Tao, X.; Gao, H.; Shen, X.; Wang, J.; and Jia, J. 2018. Scale-recurrent Network for Deep Image Deblurring. In

IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR) . 1imofte, R.; Gu, S.; Wu, J.; Van Gool, L.; Zhang, L.; Yang, M.-H.;and et al. 2018. NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results. In

The IEEE Conference onComputer Vision and Pattern Recognition (CVPR) Workshops . 1Timofte, R.; and Smet, V. D. 2014. Gool, “A+: Adjusted anchoredneighborhood regression for fast super–resolution. In in Proc.Asian Conf. Comput. Vis. (ACCV . 12Trottenberg, U.; and Schuller, A. 2001.

Multigrid . Orlando, FL,USA: Academic Press, Inc. ISBN 0-12-701070-X. 2Wang, S.; Zheng, J.; Hu, H.-M.; and Li, B. 2013. Naturalness pre-served enhancement algorithm for non-uniform illumination im-ages.

IEEE Transactions on Image Processing arXiv preprint arXiv:1904.01538 . 1Wang, X.; Girshick, R. B.; Gupta, A.; and He, K. 2018. Non-localNeural Networks. computer vision and pattern recognition

TheEuropean Conference on Computer Vision (ECCV) . 1Yu, J.; Fan, Y.; Yang, J.; Xu, N.; Wang, Z.; Wang, X.; and Huang,T. S. 2018. Wide Activation for Efﬁcient and Accurate ImageSuper-Resolution. arXiv: Computer Vision and Pattern Recogni-tion . 2Zeyde, R.; Elad, M.; and Protter, M. 2010. On single image scale-up using sparse-representations 711–730. 5, 11Zhang, H.; Sindagi, V.; and Patel, V. M. 2018. Multi–scale Sin-gle Image Dehazing using Perceptual Pyramid Deep Network. In

The IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshops . 11Zhang, K.; Zuo, W.; and Zhang, L. 2018. Learning a Single Con-volutional Super-Resolution Network for Multiple Degradations. computer vision and pattern recognition

Proceedings of the European Conference on Com-puter Vision (ECCV) , 286–301. 1, 12Zhang, Y.; Li, K.; Li, K.; Zhong, B.; and Fu, Y. 2019. ResidualNon–local Attention Networks for Image Restoration. interna-tional conference on learning representations . 1, 2Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; and Fu, Y.2018b. Evaluation code for Residual Dense Networks.https://github.com/yulunzhang/RDN/blob/master/RDN TestCode/Evaluate PSNR SSIM.m. [Online; accessed 20-May-2019]. 11Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; and Fu, Y. 2018c. Resid-ual Dense Network for Image Restoration . 5, 12Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; and Fu, Y. 2018d. Resid-ual Dense Network for Image Super-Resolution. In

CVPR . 1

Appendix

Diagrams

In an effort to make diagrams easy to read, concise andcarrying a precise meaning, we introduce the notation inFigure 10. This is, lines connected to the left–side of anygiven module represent different inputs to that module. Ev-ery module can have several inputs but only one output.

LEFT SIDE SHOWS3 DIFFERENT INPUTS RIGHT SIDE SHOWS3 COPIES OF THE OUTPUT

EQUIVALENT TO

Figure 10: Diagram notation.Lines connected to the right–side of a given module repre-sent copies of the same output.Figure 9 shows an expanded diagram of the single BPPconﬁguration used in our experiments. It uses back–projection layers (ﬂux blocks), resolution levels and , , and features per level from lowest to highest res-olution, respectively. All convolutional layers use × askernel size, and scalers are initialized with bicubic ﬁlters ofsize × and trained as additional parameters.We observe that, after initialization, the lowest–resolutionnetwork state (at the bottom of the diagram) never changes.Thus, the highest–resolution state (at the top of the diagram)is always –layers away from this ﬁxed state. This is sim-ilar to a long–range skip–connection in DenseNet (Huanget al. 2017), but in BPP these shortcut moves through a dif-ferent resolution. Because of scale causality, the next low–resolution level moves relatively close to the ﬁxed stateand we can interpret it as a shorter–range skip–connection.Thus, the particular structure of BPP allows quick pathsfrom the output to every layer of the network, similar toDenseNets, which is convenient for the gradient ﬂow dur-ing back–propagation steps. Evaluation Metrics

Quantitative evaluations in our experiments include threeobjective metrics: PSNR, SSIM and HIGRADE–2. Fromthese, PSNR and SSIM are reference–based metrics thatmeasure the difference between an impaired image andground truth. Higher values are better in both cases. ThePSNR (range to ∞ ) is a log–scale version of mean–square–error and SSIM (range to ) uses image statisticsto better correlate with human perception. Full expressionsare as follows: P SN R ( X, Y ) = 10 · log (cid:18) M SE (cid:19) , (4) SSIM ( X, Y ) = (2 µ X µ Y + c )(2 σ XY + c )( µ X + µ Y + c )( σ X + σ Y + c ) , (5)where M SE = E (cid:2) ( X − Y ) (cid:3) is the mean square error ofthe difference between X and Y ; µ X and µ Y are the aver-ages of X and Y , respectively; σ X and σ Y are the variancesof X and Y , respectively; σ XY is the covariance of X andY; c = 6 . and c = 58 . .HIGRADE–2(Kundu et al. 2017b) is a non–reference im-age quality metric based on gradient scene–statistics de-ﬁned in the LAB color space and it is often used to evalu-ate high–dynamic–range images. Here, we used the Matlabcode available in (Kundu et al. 2017a).n the case of PSNR and SSIM metrics, we follow existingbenchmarks that use different versions of these metrics. Weused the following three deﬁnitions in our experiments:• P SN R/SSIM − Y M : Based on the M atlab codeavailable in (Zhang et al. 2018b), computes PSNR/SSIMon the Y channel. Matlab uses a conversion of RGB toYUV color–spaces following the BT.709 standard, in-cluding offsets that are often avoided in other implemen-tations.• P SN R/SSIM − Y P : Based on the P ython codeavailable in (Qian et al. 2018c), computes PSNR/SSIMon the Y channel. The code uses an OpenCV function toconvert from RGB to YCbCr color–space.• P SN R/SSIM − RGB : Based on the Python codeavailable in (Ancuti, Ancuti, and Timofte 2019), com-putes the average PSNR/SSIM for pairs of RGB images.

Training Settings

Image Super–Resolution

We use DIV2K(Agustsson andTimofte 2017) and FLICKR–2K datasets for training and thefollowing datasets for test: Set–14(Zeyde, Elad, and Protter2010), BSDS–100(Martin et al. 2001), Urban–100(Huang,Singh, and Ahuja 2015) and Manga–109(Matsui et al. 2017).Impaired images were obtained by downscaling and then up-scaling ground truth images, using Bicubic scaler, with scal-ing factors: × , × , × and × . Our target is to recoverthe ground truth so we use a loss function that measures the L distance between impaired images and ground truth. Forevaluation we measure PSNR and SSIM on the Y–channelusing the Matlab code from (Zhang et al. 2018b).We follow the training settings from (Lim et al. 2017). Ineach training batch, we randomly take impaired patchesfrom our training set ( DIV2K plus , FLICKR–2Kimages). We consider two cases: we train a model BPP–SR × f for each upscaling factor f = 2 , , and ; and wealso train a model BPP–SRx2x3x4x8 to restore impairedimages with unknown upscaling factor. We use patch size f × f , for f = 2 , and , and × for f = 8 and unknown upscaling factor. We augment the patches byrandom horizontal/vertical ﬂipping and rotating ◦ . We useAdam optimizer(Kingma and Ba 2015) with learning rateinitialized to − and decreased by half every , back–propagation steps.The training data used for the BPP–SRx2x3x4x8 modelincludes all images used for training the upscaling factors f = 2 , , and . We could have chosen to train our modelusing a random and fractional upscaling factor . (cid:54) f (cid:54) . , but this would have made it difﬁcult to reproduce thetraining settings. Mobile–to–DSLR Photo Translation

We use theDPED(Ignatov et al. 2017) dataset for training and test.This dataset provides × aligned patches taken fromiPhone–mobile photos (impaired) and DSLR–Canon photos(ground truth). There are , patches available fortraining and , patches for test. We take patchesfrom the test set for validation during training. We use fullsize iPhone images from DPED for qualitative results. For loss function we use the negative SSIM between impairedand ground truth patches. We ﬁnd SSIM to be more effectivethan L and MSE losses in this problem. For evaluationwe measure the average PSNR and SSIM metrics for RGBpairs, using the code from (Ancuti, Ancuti, and Timofte2019), and the non–reference metric HIGRADE–2(Kunduet al. 2017b) using the Matlab code available from (Kunduet al. 2017a).In each training batch, we take patches of size × . We use Adam optimizer(Kingma and Ba 2015) withlearning rate initialized to − and decreased by half ev-ery , back–propagation steps. We do not observe im-provements after epochs. Image Dehaze

We use the following real haze datasets:I–Haze(Ancuti et al. 2018a), O–Haze(Ancuti et al. 2018c)and Dense–Haze(Ancuti et al. 2019). We follow the train-ing setting from (Zhang, Sindagi, and Patel 2018). In eachtraining batch, we take patch of size × . The train-ing set is augmented by rescaling the images, using bicu-bic scaler, to . × , × , . × and . × the originalsize. We use Adam optimizer(Kingma and Ba 2015) withlearning rate initialized to − and decreased by half ev-ery , back–propagation steps. We train the systemfor , epochs. Joint HDR and Super–Resolution

We use the HDR–Eye(Nemoto et al. 2015) dataset for training and WangLDR(Wang et al. 2013) dataset for test. HDR–Eye(Nemotoet al. 2015) provides HDR images constructed from multi–exposure photographs. Following the training conﬁgurationin (Soh, Park, and Cho 2019), we select from a to-tal of standard–exposed and HDR–constructed pairs ofimages (we excluded images C01.png, C04.png, C13.png,C28.png, C38.png and C42.png, because of visible mis-alignment problems in the HDR image constructions). Then,we take each standard–exposed image and we: ﬁrst, down-scale it by factor ; and then upscale it by factor (bothwith bicubic scaler), and use this output as impaired image.We follow the conﬁguration in (Soh, Park, and Cho 2019)although their tone–mapping algorithms are not speciﬁedand tone–mapped images are not provided. We use severaltone–mapping algorithms until being able to produce com-petitive quantitative and qualitative outputs. For our ﬁnalresults we used the OpenCV implementation of Reinhard–Devlin tone–mapping (Reinhard and Devlin 2005) with pa-rameters gamma = 2 . , intensity = 0 , light adapt = 0 . andcolor adapt = 0 .We train our system using patches of size × .Following the analysis in (Soh, Park, and Cho 2019), weuse the non–reference image quality metrics: Ma(Ma et al.2017, 2018), to evaluate SR improvements; and HIGRADE–2(Kundu et al. 2017b,a) to evaluate HDR improvements. Weaugment the patches by random horizontal/vertical ﬂippingand rotating ◦ . We use Adam optimizer(Kingma and Ba2015) with learning rate initialized to − and decreasedby half every , back–propagation steps. We train thesystem for , epochs.able 3: Extended quantitative evaluation for super–resolution. Set14 BSDS100 Urban100 Manga109Algorithm PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M PSNR– Y M SSIM– Y M Bicubic × RCAN (Zhang et al. 2018a) 34.12 0.921 32.41

BPP–SRx2x3x4x8 33.27 0.913 31.21 0.879 31.67 0.921 38.31 0.975BPP–SRx2 34.23 0.922 31.63 0.886 33.07 0.935 39.19 0.977Bicubic × BPP–SRx2x3x4x8 30.23 0.838 28.81 0.794 28.43 0.852 33.75 0.943BPP–SRx3 × RCAN (Zhang et al. 2018a) 28.87 0.789 27.77 0.744 26.82 0.809 31.22 0.917BPP–SRx2x3x4x8 28.55 0.778 27.43 0.728 26.48 0.791 30.81 0.909BPP–SRx4 × RCAN (Zhang et al. 2018a) 25.23 0.651 24.98 0.606 23.00 0.645 25.24 0.803BPP–SRx2x3x4x8 25.10 0.642 24.89 0.598 22.72 0.626 24.78 0.785BPP–SRx8 igure 11: Extended qualitative evaluation for super–resolution.igure 12: Extended qualitative evaluation for Mobile–to–DSLR photo translation.igure 13: Extended qualitative evaluation for image dehaze for Indoor/Outdoor datasets.igure 14: Extended qualitative evaluation for image dehaze for Dense dataset.Figure 15: Extended qualitative evaluation for joint HDR+SR enhancement. aindrop Removal

We use the DeRaindrop(Qian et al.2018a,b) dataset for training and test. This dataset providespaired images, one degraded by raindrops and the other onefree from raindrops. In each training batch, we take patchof size × . We train a BPP model using L lossand patch size × . We use Adam optimizer(Kingmaand Ba 2015) with learning rate initialized to − and de-creased by half every , back–propagation steps. Wetrain the system for , epochs. Computing Infrastructure

All training processes run on Linux operating system, us-ing implementations in Python language with software pack-ages: Numpy, Pytorch(Paszke et al. 2017), Scilab, Pillowand OpenCV. We used NVIDIA Tesla M40 (24GB) GPU fortraining and NVIDIA Titan–X Maxwell (12GB) for tests.