[PDF] Light Field Super-Resolution using a Low-Rank Prior and Deep Convolutional Neural Networks

Abstract

Light field imaging has recently known a regain of interest due to the availability of practical light field capturing systems that offer a wide range of applications in the field of computer vision. However, capturing high-resolution light fields remains technologically challenging since the increase in angular resolution is often accompanied by a significant reduction in spatial resolution. This paper describes a learning-based spatial light field super-resolution method that allows the restoration of the entire light field with consistency across all sub-aperture images. The algorithm first uses optical flow to align the light field and then reduces its angular dimension using low-rank approximation. We then consider the linearly independent columns of the resulting low-rank model as an embedding, which is restored using a deep convolutional neural network (DCNN). The super-resolved embedding is then used to reconstruct the remaining sub-aperture images. The original disparities are restored using inverse warping where missing pixels are approximated using a novel light field inpainting algorithm. Experimental results show that the proposed method outperforms existing light field super-resolution algorithms, achieving PSNR gains of 0.23 dB over the second best performing method. This performance can be further improved using iterative back-projection as a post-processing step.

Full PDF

11 Light Field Super-Resolution using a Low-RankPrior and Deep Convolutional Neural Networks

Reuben A. Farrugia,

Senior Member, IEEE,

Christine Guillemot,

Fellow, IEEE,

Abstract —Light ﬁeld imaging has recently known a regain of interest due to the availability of practical light ﬁeld capturing systems thatoffer a wide range of applications in the ﬁeld of computer vision. However, capturing high-resolution light ﬁelds remains technologicallychallenging since the increase in angular resolution is often accompanied by a signiﬁcant reduction in spatial resolution. This paperdescribes a learning-based spatial light ﬁeld super-resolution method that allows the restoration of the entire light ﬁeld with consistencyacross all sub-aperture images. The algorithm ﬁrst uses optical ﬂow to align the light ﬁeld and then reduces its angular dimension usinglow-rank approximation. We then consider the linearly independent columns of the resulting low-rank model as an embedding, which isrestored using a deep convolutional neural network (DCNN). The super-resolved embedding is then used to reconstruct the remainingsub-aperture images. The original disparities are restored using inverse warping where missing pixels are approximated using a novellight ﬁeld inpainting algorithm. Experimental results show that the proposed method outperforms existing light ﬁeld super-resolutionalgorithms, achieving PSNR gains of 0.23 dB over the second best performing method. This performance can be further improvedusing iterative back-projection as a post-processing step.

Index Terms —Deep Convolutional Neural Networks, Light Field, Low-Rank Matrix Approximation, Super-Resolution. (cid:70)

NTRODUCTION L IGHT ﬁeld imaging has emerged as a promising tech-nology for a variety of applications going from photo-realistic image-based rendering to computer vision applica-tions such as 3D modeling, object detection, classiﬁcationand recognition. As opposed to traditional photographywhich captures a 2D projection of the light in the scene,light ﬁelds collect the radiance of light rays along differentdirections [1], [2]. This rich visual description of the sceneoffers powerful capabilities for scene understanding and forimproving the performance of traditional computer visionproblems such as depth sensing, post-capture refocusing,segmentation, video stabilization and material classiﬁcationto mention a few.Light ﬁelds acquisition devices have been recently de-signed, going from rigs of cameras [2] capturing the scenefrom slightly different viewpoints to plenoptic cameras us-ing micro-lens arrays placed in front of the photo-sensor [1].These acquisition devices offer different trade-offs betweenangular and spatial resolution. Rigs of cameras captureviews with a high spatial resolution but in general with alimited angular sampling hence large disparities betweenviews. On the other hand, plenoptic cameras capture viewswith a high angular sampling, but at the expense of alimited spatial resolution. In plenoptic cameras, the angularsampling is related to the number of sensor pixels locatedbehind each microlens, while the spatial sampling is relatedto the number of microlenses. • • C. Guillemot is with the Institut National de Recherche en Infor-matique et en Automatique, Rennes 35042, France, e-mail: ([email protected]).Manuscript received January, 2018; revised xxx.

Light ﬁelds represent very large volumes of high dimen-sional data bringing new challenges in terms of capture,compression, editing and display. The design of efﬁcientlight ﬁeld image processing algorithms, going from anal-ysis, compression to super-resolution and editing has thusrecently attracted interest from the research community. Acomprehensive overview of light ﬁeld image processingtechniques can be found in [3].This paper addresses the problem of light ﬁeld spatialsuper-resolution. Single image super-resolution has been anactive ﬁeld of research in the past years, leading to quitemature solutions. However, super-resolving each view sep-arately using state of the art single-image super-resolutiontechniques would not take advantage of light ﬁeld prop-erties, in particular of angular redundancy which dependson scene geometry [4]. Moreover, considering each sub-aperture image as a separate entity may reconstruct lightﬁelds which are angularly incoherent [5].Assuming that the low-resolution light ﬁeld capturesdifferent views of the same scene taken with sub-pixelmisalignment, the problem can be posed as the one ofrecovering the high-resolution (HR) views from multiplelow- resolution images with unknown non integer trans-lational misalignment. A number of methods hence proceedin two steps. A ﬁrst step consists in estimating the trans-lational misalignment using depth or disparity estimationtechniques. The HR light ﬁeld views are then found us-ing Bayesian or variational optimization frameworks withdifferent priors. This is the case in [6] and [7] where theauthors ﬁrst recover a depth map and formulate the spatiallight ﬁeld super-resolution problem either as a simple linearproblem [6] or as a Bayesian inference problem [7] assum-ing an image formation model with Lambertian reﬂectancepriors and a depth-dependent blurring kernel. A Gaussianmixture model (GMM) is proposed instead in [8] to address a r X i v : . [ c s . C V ] J a n denoising, spatial and angular super-resolution of lightﬁelds. The reconstructed 4D-patches are estimated using alinear minimum mean square error (LMMSE) estimator, as-suming a disparity-dependent GMM for the patch structure.In [9], the geometry is estimated by computing structuretensors in the Epipolar Plane Images (EPI). A variationaloptimization framework is then used to spatially super-resolve the different views given their estimated depth mapsand to increase the angular resolution.Another category of methods is based on machine learn-ing techniques which learn a model of correspondencesbetween low- and high-resolution data. In [5], the authorslearn projections between low-dimensional subspaces of3D patch-volumes of low- and high-resolution, using ridgeregression. Data-driven learning methods based on deepneural network models have been recently shown to bequite promising for light ﬁelds super-resolution. Stackedinput images are up-scaled to a target resolution usingbicubic interpolation and super-resolved using a spatialconvolutional neural network (CNN) in [10] which learnsa non-linear mapping between low- and high-resolutionviews. The output of the spatial CNN is then fed into asecond CNN to perform angular super-resolution. While theapproach in [10] takes at the input of the spatial CNN pairsor -tuples of neighboring views, leading to three spatialCNNs to be learned, a single CNN is proposed by the sameauthors in [11] to process each view independently. Theproblem of angular super-resolution of light ﬁelds is alsoaddressed in [12] using an architecture based on two CNNs,one CNN being used to estimate disparity maps and thesecond CNN being used to synthesis intermediate views.The authors in [13] deﬁne a CNN architecture in the EPI toincrease the angular resolution.One can also cite some related work using a hybrid lightﬁeld imaging system coupling a high-resolution camerawith either a light ﬁeld camera [14] or with multiple low-resolution cameras [15]. In [14], the HR image captured bythe DSLR camera is used to super-resolve the low-resolutionimages captured by an Illum light ﬁeld camera. The authorsin [15] describe an acquisition device formed by eight low-resolution side cameras arranged around a central high-quality SLR lens. A super-resolution method, called iterativepatch- and depth-based synthesis (iPADS), is then proposedto reconstruct a light ﬁeld with the spatial resolution of theSLR camera and an increased number of views.In this paper, we propose a spatial light ﬁeld super-resolution method using a deep CNN (DCNN) with tenconvolutional layers. Instead of using DCNN to restore eachsub-aperture image independently, as done in [10], [11], werestore all sub-aperture images within a light ﬁeld simulta-neously. This allows us to exploit both spatial and angularinformation to restore the light ﬁeld and thus generate lightﬁelds which are angularly coherent. A Na¨ıve approach isto train a DCNN with n = P × Q inputs, where P and Q represent the number of vertical and horizontal angularviews respectively. However, this will signiﬁcantly increasethe complexity of the DCNN which makes it harder totrain and more prone to over-ﬁtting . Instead, given thateach sub-aperture image captures the same scene from adifferent view point, we align all sub-aperture images to thecentre view using optical ﬂow and then reduce the angular dimension of the aligned light ﬁeld using a low-rank modelof rank k , where k (cid:28) n . Results in section 3.1 show that thealignment allows us, with the considered low rank model, tosigniﬁcantly reduce the angular dimension of the light ﬁeld.The linearly independent column-vectors of the low-rankrepresentation of the aligned light ﬁeld, which constitute anembedding of the light ﬁeld views in a lower-dimensionalspace, are then considered as a volume and simultaneouslyrestored using a DCNN with k input channels. This allowsus to signiﬁcantly reduce the complexity of the networkwhich is easier to train while still preserving angular con-sistency. The restored column-vectors are then combined toreconstruct the aligned high-resolution light ﬁeld. In theﬁnal stage we use inverse warping to restore the originaldisparities of the light ﬁeld and ﬁll the cracks caused byocclusion using a novel diffusion based inpainting strategythat propagates the restored pixels along the dominantorientation of the EPI.Simulation results demonstrate that the proposedmethod outperforms all other schemes considered herewhen tested on 13 different light ﬁelds from two differentdatasets. It is important to mention that our method wasnot trained on the Stanford light ﬁelds and these resultsclearly show that our proposed method generalizes welleven when considering light ﬁeld structures whose dispar-ities are signiﬁcantly larger than those used for training.Further analysis in section 4 shows that additional gainin performance can be achieved using iterative back pro-jection (IBP) as a post processing step. These results showthat our method can signiﬁcantly outperform existing lightﬁeld super-resolution methods including the deep learning-based light ﬁeld super-resolution method presented in [11].This paper is organized as follows. After introducing thenotation in section 2, we describe the proposed method insection 3. Section 4 discusses the simulation results with dif-ferent types of light ﬁelds and provide the ﬁnal concludingremarks in section 5. OTATION AND P ROBLEM F ORMULATION

We consider here the simpliﬁed 4D representation of lightﬁelds called 4D light ﬁeld in [16] and lumigraph in [17],describing the radiance along rays by a function I ( x, y, s, t ) where the pairs ( x, y ) and ( s, t ) respectively represent spa-tial and angular coordinates. The light ﬁeld can be seenas capturing an array of viewpoints (called sub-apertureimages) of the scene with varying angular coordinates ( s, t ) .The different views will be denoted here by I s,t ∈ R X,Y ,where X and Y represent the vertical and horizontal di-mension of each sub-aperture image.In the following, the notation I s,t for the different sub-aperture images will be simpliﬁed as I i with a bijectionbetween ( s, t ) and i . The complete light ﬁeld can hence berepresented by a matrix I ∈ R m,n : I = [ vec ( I ) | vec ( I ) | · · · | vec ( I n )] (1)with vec ( I i ) being the vectorized representation of the i − thsub-aperture image, m represents the number of pixels ineach view ( m = X × Y ) and n is the number of views in thelight ﬁeld ( n = P × Q ), where P and Q represent the numberof vertical and horizontal angular views respectively. Let I H and I L denote the high- and low- resolutionlight ﬁelds, respectively. The super-resolution problem canbe formulated in Banach space as I L = ↓ α BI H + η (2)where η is an additive noise matrix, ↓ α is a downsamplingoperator applied on each sub-aperture image where α isthe magniﬁcation factor and B is the blurring kernel. Thereare many possible high-resolution light ﬁelds I H whichcan produce the input low-resolution light ﬁeld I L via theacquisition model deﬁned in (2). Hence, solving this ill-posed inverse problem requires introducing some priors on I H , which can be a statistical prior such as a GMM model[8], or priors learned from training data as in [5], [10], [11].Another way to visualize a light ﬁeld is to considerthe EPI representation. An EPI is a spatio-angular slicefrom the light ﬁeld, obtained by ﬁxing one of the spatialcoordinates and one of the angular coordinates. Considerwe ﬁx y := y ∗ and t := t ∗ , an EPI is an image deﬁnedas (cid:15) y ∗ ,t ∗ := I ( x, y ∗ , s, t ∗ ) . Alternatively, the vertical EPI isobtained by ﬁxing x := x ∗ and s := s ∗ . Figure 5b showsa typical EPI structure, where the slopes of the isophotelines in the EPI are related to the disparity between theviews [9]. Isophote lines with a slope of π/ indicatethat there is no disparity across the views while the largeris the difference between the slope and π/ the larger isthe disparity across the views. ROPOSED METHOD

Figure 1 depicts the block diagram of the proposed spatiallight ﬁeld super-resolution algorithm. Each sub-apertureimage of the low-resolution light ﬁeld I L is ﬁrst bicubicinterpolated so that both I H and I L have the same reso-lution. While a light ﬁeld consists of a very large volume ofhigh-dimensional data, it also contains a lot of redundantinformation since every sub-aperture image captures thesame scene from a different viewpoint. Moreover, differentlight ﬁeld capturing devices have different spatial and angu-lar speciﬁcations, which makes it very hard for a learning-based algorithm to learn a generalized model suitable torestore all kind of light ﬁelds irrespective of the capturingdevice. The Dimensionality Reduction module tries to solveboth problems simultaneously where it uses optical ﬂow toalign the light ﬁeld and a low-rank matrix approximationto reduce the dimension of the light ﬁeld. Results in section3.1 show that we can reduce the dimensionality of the lightﬁeld from R m,n to R m,k , where k (cid:28) n is the rank of thematrix, while preserving most of the information containedin the light ﬁeld.The Light Field Restoration module then considers the k linear independent column-vectors of the rank- k represen-tation of the low-resolution light ﬁeld as an embeddingof the light ﬁeld. We then use a DCNN to recover thetexture details of the light ﬁeld embedding in the lowerdimensional space. The super-resolved embedding givesan estimate of the aligned high-resolution light ﬁeld. The Light Field Reconstruction module then warps the estimatedaligned high-resolution light ﬁeld to restore the originaldisparities. Holes corresponding to cracks or occlusions are then ﬁlled in by diffusing information in the Epipolar PlaneImages (EPI) along directions of isophote lines computed,for the positions of missing pixels, in the EPI of the low-resolution light ﬁeld.

Iterative back-projection can be furtherused as a post-process to reﬁne the super-resolved light ﬁeldand assure to be consistent with the low-resolution lightﬁeld. More information about each module is provided inthe following subsections.

Many acquisition devices have been recently designed tocapture light ﬁelds, including multi-sensor approaches [2],time-sequential capture methods [18], [19] and plenopticcameras [1], [20]. All these different light ﬁeld cameras havediverse spatial and angular speciﬁcations, which makes itvery hard for a learning-based algorithm to learn a general-ized model suitable to restore any kind of light ﬁeld inde-pendent from the source. Moreover, a light ﬁeld contains ahuge amount of redundant information since it represents adifferent view of the same scene.The authors in [21] hypothesized that this redundancycan be suppressed by jointly aligning the sub-aperture im-ages in the light ﬁeld and estimating a low-rank approx-imation (LRA) of the light ﬁeld. This approach has shownvery promising results in the ﬁeld of light ﬁeld compression.In the same spirit, the RASL algorithm [22] was used toﬁnd the homographies that globally align a batch of linearlycorrelated images. Both methods seek for an optimal set ofhomographies such that the matrix of aligned images can bedecomposed in a low-rank matrix of aligned images, withthe latter constraining the error matrix to be sparse. Theresults in Figure 2 show that while both RASL and LRAmethods manage to globally align the sub-aperture images,the mean sub-aperture image is still very blurred, indicatingthat the light ﬁeld is not suitably aligned.The authors in [5] have used the block matching algo-rithm (BMA) to align patch volumes. The results in Figure 2show that BMA manages to align better the sub-aperture im-ages, where the average variance across the n sub-apertureimage is signiﬁcantly reduced. This result suggests that localmethods can improve the alignment of the sub-apertureimages, which as we will see in the sequel, will allow usto signiﬁcantly reduce the dimensionality of the light ﬁeld.In this paper, we formulate the light ﬁeld dimensionalityreduction problem as min u , v , A || Γ u , v (cid:16) I L (cid:17) − A || s.t. rank ( A ) = k (3)where u ∈ R m,n and v ∈ R m,n are ﬂow vectors that specifythe displacement of each pixel needed to align each sub-aperture image with the centre view, A is a rank- k matrixwhich approximates the aligned light ﬁeld and Γ u , v ( · ) isa forward warping operator (which here performs a dis-parity compensation where the disparities maps ( u , v ) areestimated with an optical ﬂow estimator). This optimizationproblem is computationally intractable. Instead, we decom-pose this problem in two sub-problems: i) use optical ﬂowto ﬁnd the ﬂow matrices u and v that best align each sub-aperture image with the centre view and ii) use low-rankapproximation to derive the rank- k matrix that minimizesthe error with respect to the aligned light ﬁeld. Fig. 1: Block diagram of the proposed light ﬁeld super-resolution method

No Align RASL [22] LRMA [21] BMA Horn-Scunk [23] SIFT Flow [24] CPM [25] SPM-BP [26]44.714 43.896 44.787 4.434 16.466 1.796 7.809 5.817306.960 286.078 306.166 24.519 45.588 8.560 68.067 22.31399.092 98.261 99.080 46.374 36.388 19.164 51.073 71.652

Fig. 2: Cropped regions of the mean sub-aperture images when using different disparity compensation methods.Underneath each image we provide the average variance across the n sub-aperture images which was used in [5] tocharacterize the performance of the alignment algorithm, where smaller values indicate better alignment. The problem of aligning all the sub-aperture images withthe centre view can be formulated as I j ( x, y ) = I i ( x + u i , y + v i ) i ∈ [1 , n ] ( i (cid:54) = j ) (4)where j corresponds to the index of the centre view, and ( u i , v i ) are the ﬂow vectors optimal to align the i -th sub- aperture image with the centre view. There are severaloptical ﬂow algorithms intended to solve this problem[23], [24], [25], [26] where Figure 2 shows the performanceof some of these methods. It can be seen that the meanaligned sub-aperture images computed using [23] is gen-erally blurred while those aligned using the methods in[25], [26] generally provide ghosting artefacts at the edges.Moreover, it can be seen that SIFT Flow [24] generally provides very good alignment and manages to attain thesmallest variation across the sub-aperture images. While theSIFT Flow algorithm will be used in this paper to computethe ﬂow vectors, any other optical ﬂow method can be used. Given that the ﬂow-vectors ( u i , v i ) for the i -th sub-apertureimage are already available, the minimization problem inEq. (3) can now be reduced to min B L , C L || I L Γ − B L C L || s.t. rank ( B L ) = k (5)where I L Γ = Γ u , v (cid:0) I L (cid:1) , B L ∈ R m,k is a rank- k matrix and C L ∈ R k,n is the combination weight matrix. These matricescan be found using singular value decomposition (SVD) I L Γ = UΣV T , where B L is set as the k ﬁrst columns of UΣ and C L is set as the k ﬁrst rows of V T , so that B L C L is the closest k -rank approximation of the aligned light ﬁeld I L Γ . The error matrix E L is the error matrix which is simplycomputed using E L = I L Γ − B L C L Figure 3 depicts the performance of three different di-mensionality reduction techniques at different ranks. Tomeasure the dimensionality reduction ability of these meth-ods we compute the root mean square error (RMSE) be-tween the aligned original and the rank- k representationof the aligned light ﬁeld. It can be seen that the RASLalgorithm has the largest distortions at almost all rankswhen compared to the other two approaches. On the otherhand, it can be seen that HLRA manages to signiﬁcantlyoutperform the RASL method, which gain is attained bythe improved alignment. Nevertheless, it can be clearly ob-served that the proposed Sift Flow + LRA method gives thebest performance, especially at lower ranks, indicating thatmore information is captured within the low-rank matrix.To emphasize this point we show in ﬁgure 3 the principalbasis of PCA, HLRA and Sift Flow + LRA. PCA is computedon the light ﬁeld without disparity estimation and thereforecan be considered here as a baseline to show that alignmentallows us to get more information in the principal basis.Moreover, it can be seen that the principal basis derivedusing our Sift Flow + LRA manages to capture more tex-ture detail in the principal basis than the other methodsand conﬁrms the beneﬁt that local alignment has on theenergy compaction ability of the proposed dimensionalityreduction method. We consider a low-rank representation of the aligned low-resolution light ﬁeld A L = B L C L , where A L ∈ R m,n isa rank- k matrix with k (cid:28) n . Similarly, A H = B H C H is a rank- k representation of the aligned high-resolutionlight ﬁeld. The rank of a matrix is deﬁned as the maxi-mum number of linearly independent column vectors in thematrix. Moreover, the linearly dependent column vectorsof a matrix can be reconstructed using a weighted sum-mation of the linearly independent column vectors of the

1. Note that the RASL method does decompose the matrix into acombination of basis elements and therefore the principal basis of RASLcould not be shown here. same matrix. This leads us to decompose A L in two sub-matrices: ˘ A L ∈ R m,k which groups the linear independentcolumn vectors of A L and ˆ A L ∈ R m,n − k which groups thelinearly dependent column vectors of A L . In practice, wedecompose the rank- k matrix A L using QR decomposition( i.e . A L = QR ). The index of the linearly independentcomponents of A L then correspond to the index of the non-zero diagonal elements of the upper-triangular matrix R .We then use the same indices to decompose A H into sub-matrices ˘ A H and ˆ A H . The matrix ˆ A L can be reconstructedas a linear combination of ˘ A L , where the weight matrix W is computed using W = (cid:16) ˘ A L (cid:124) ˘ A L (cid:17) † ˘ A L (cid:124) ˆ A L (6)where ( · ) † stands for the pseudo inverse operator. Weassume here that the weight matrix W , which is optimalin terms of least squares to reconstruct ˆ A L , is suitable toreconstruct ˆ A H .Driven by the recent success of deep learning in the ﬁeldof single-image [27], [28] and light ﬁeld super-resolution[10], [11], we use a DCNN to model the upscaling functionthat minimizes the following objective function / || ˘ A H − f (cid:16) ˘ A L (cid:17) || (7)where f ( · ) is a function modelled by the DCNN illustratedin Figure 4 which has ten convolutional layers. The linearlyindependent sub-matrix ˘ A L is passed through a stack ofconvolutional and rectiﬁed linear unit (ReLU) layers. We usea convolution stride of 1 pixel with no padding nor spatialpooling. The ﬁrst convolutional layer has 64 ﬁlters of size × × k while the last layer, which is used to reconstructthe high-resolution light ﬁeld, employs k ﬁlter of size × × . All the other layers use 64 ﬁlters of size × × which are initialized using the method in [29]. The DCNNwas trained using a total of 200,000 random patch-volumesof size × × k from the 98 low- and high-resolutionlow-rank approximation of rank k of the light ﬁelds fromthe EPFL, INRIA and HCI datasets . The Titan GTX1080TiGraphical Processing Unit (GPU) was used to speed up thetraining process.During the evaluation phase, we estimate the super-resolved linearly independent representation of the lightﬁeld ˘ A H = f (cid:16) ˘ A L (cid:17) . We then estimate the super-resolvedlinear dependent part of the light ﬁeld using ˆ A H = ˘ A H W (8)The super-resolved low-rank representation of the alignedlight ﬁeld is then derived by the union of the two matrices ˆ A H and ˘ A H i.e . A H = ˆ A H ∪ ˘ A H . The super-resolved lightﬁeld is then reconstructed using ˆ I H Γ = A H + E L (9)

2. It must be noted that none of the light ﬁelds used for validationwere used for training.

Rank R M SE RASLHLRASift Flow + LRA

PCA HLRA Sift Flow + LRA PCA HLRA Sift Flow + LRA

Fig. 3: These ﬁgures show how the error between the low-rank and full rank representation vary at different ranks. It canbe seen that using optical ﬂow to align the light ﬁeld followed by low-rank approximation attains the best performance.The images in the second row show the principal basis derived using different methods. The sharper the principal basis isthe more information is being captured in the principal basis.Fig. 4: The proposed network structure which receives a low-resolution light ﬁeld and restores it using the proposedDCNN.

The restored aligned light ﬁeld ˆ I H Γ has all sub-apertureimages aligned with the centre view. A na¨ıve approach torecover the original disparities of the restored sub-apertureimages is to use forward warping. However, as can be seenin the ﬁrst column of Figure 5a, forward warping is not ableto restore all pixels and results in a number of cracks or holescorresponding to occlusions (marked in green). One can useeither bicubic interpolation or inverse warping to ﬁll theholes. However, in case of occlusions, the neighbouring pix-els may not be well correlated with the missing information,which often results in inaccurate estimations (see Figure 5asecond column). More advanced inpainting algorithms [30],[31] can be used to restore each hole separately. However,these methods do not exploit the light ﬁeld structure andtherefore provide inconsistent reconstruction of the samespatial region at different angular views. Forward Warping Bicubic Interpolation EPI Diffusion (a) Inpainting the cracks marked in green(b) Diffusion based inpainting

Fig. 5: Filling the missing information caused by occlusion.In this work, we use a diffusion based inpainting algo-rithm that estimates the missing pixels by diffusing infor-mation available in other views. Similar to the work in [32],we exploit the EPI structure to diffuse information along the dominant orientation of the EPI. However, instead ofpredicting the orientation of unknown pixels from theirspatial neighborhood as done in [32], we exploit the sim-ilarity between the low- and super-resolved EPIs and usethe structure tensor computed on the low-resolution EPI toguide the inpainting process in the high-resolution EPI.Without loss of generality we consider the EPI where thedimensions y ∗ and t ∗ are ﬁxed. The case of vertical slicesis analogous. We ﬁrst compute the structure tensor of thelow-resolution EPI (cid:15) Ly ∗ ,t ∗ at coordinates ( x, s ) using T ( x, s ) = ∇ (cid:15) Ly ∗ ,t ∗ ( x, s ) ∇ (cid:15) Ly ∗ ,t ∗ ( x, s ) (cid:124) (10)where ∇ stands for the single order gradient computedusing the sobel kernel. The authors in [32] compute anaverage weighting of the columns of T ( x, s ) to derivethe dominant orientation, where the weights are given byan anisotropy measure. Nevertheless, the anisotropy mayfail in regions that are smooth and therefore the weightedaverage may fail in these regions to estimate the dominantdirection of the EPI. This problem becomes an issue whencomputing these orientations on low-resolution versions ofthe light ﬁelds. Instead, we estimate the orientation at everypixel in the EPI by computing the eigen decomposition of T ( x, s ) and choose the direction θ ( x, s ) which correspondsto the eigen-vector with the smallest eigen-value. Moreover,driven by the observation that the disparities in a light ﬁeldare typically small, and considering that the local slope inthe EPI is proportional to the disparity, it is reasonable toassume that slopes which correspond to large disparitiesare less probable to occur. Therefore, to ensure that thetensor driven diffusion is performed along a single coherentdirection per column of the EPI and reduce noise, thedominant orientation θ ( x ) is computed using the column-wise median of θ ( x, s ) whose orientation is in the range [ π − α, π + α ] radians. In all our experiments we set α = π/ . While the dominant orientation vectors are lessnoisy, we further reduce the noise by applying the TotalVariation (TV-L1) denoising [33] on the orientation ﬁeld θ ( x ) , which searches to optimize ˆ θ = arg min ˆ θ ||∇ ˆ θ || + λ || ˆ θ − θ || (11)where λ is the total variation regularization parameter andwas set to 0.5 in our experiments. Figure 5b (top) shows theEPI of the low-resolution light ﬁeld (cid:15) Ly ∗ ,t ∗ and the dominantorientations ˆ θ ( x ) marked by blue arrows.The restored EPI (cid:15) Hy ∗ ,t ∗ := ˆ I H ( x, y ∗ , s, t ∗ ) (see Figure 5b(bottom)) has a number of missing pixels (marked in green).Consider that the hole we want to inpaint has coordinates ( x p , s p ) . The aim of the proposed diffusion based inpaintingalgorithm is to propagate known pixels in the orientation ˆ θ ( x p ) to ﬁll the missing pixels. The diffusion over the EPI (cid:15) Hy ∗ ,t ∗ evolves as ∂ (cid:15) Hy ∗ ,t ∗ ∂s = Tr (cid:16) ˆ θ ( x p ) ˆ θ ( x p ) (cid:124) H ( x p , s p ) (cid:17) (12)where Tr ( · ) stands for the trace operator and H ( x, s ) denotes the Hessian of (cid:15) Hy ∗ ,t ∗ at coordinates ( x, s ) . The term ˆ θ ( x p ) ˆ θ ( x p ) (cid:124) is used to enforce the diffusion to occur onlyin the direction of the isophote eigenvector. The missing pixels are restored iteratively by ﬁnding the solution to (12)which is closest to zero. Figure 5a (third column) shows theresults attained using the proposed inpainting strategy.In order to restore all the cracks in the light ﬁeld we ﬁrstﬁx t ∗ to that of the center view and iteratively restore allthe horizontal EPIs for all y ∗ ∈ [1 , Y ] by solving Eq. (12).This corresponds to ﬁlling the cracks for the centre row ofthe matrix of sub-aperture images. We then ﬁx s ∗ to that ofthe centre view and iteratively restore all the vertical EPIsfor all x ∗ ∈ [1 , X ] which effectively restores all cracks forthe centre column of the matrix of sub-aperture images. Theremaining propagations are performed row-by-row whereeach time we restore all pixels within t ∗ ∈ [1 , Q ] . One problem with the method proposed in this paper isthat after we restore the aligned light ﬁeld ˆ I H Γ we have tocompute inverse warping to restore the original dispari-ties. However, the inverse warping is not able to recoveroccluded regions and some pixels are displaced by ± -pixel due to rounding errors. While the former problemis solved using the method described in Section 3.3, thesecond problem was not yet addressed. Nevertheless, theresults illustrated in Tables 1, 2 and Figure 6 indicate thatsigniﬁcant performance gains can be achieved even if wedo not explicitly cater for distortions caused by roundingerrors in the inverse warping process. Nevertheless, thesedistortions can be corrected using the classical method ofiterative back projection [34], which is adopted by severalsingle image super-resolution methods (see [35]) to ensurethat the down sampled version of the super-resolved lightﬁeld is consistent with the observed low-resolution lightﬁeld. The IBP algorithm iteratively reﬁnes the estimatedhigh-resolution light ﬁeld ¯ I Hκ at iteration κ by ﬁrst back-projecting it into an estimated low-resolution light ﬁeld ¯ I Lκ using ¯ I Lκ = ↑ α (cid:16) ↓ α B ¯ I Hκ (cid:17) (13)where ↓ α is a downsampling operator, ↑ α is the bicubicupscaling operation, α is the magniﬁcation factor and B is the blurring kernel. The deviation between the LR viewsfound by back-projection and the original LR views is thenused to further correct each HR estimated view of the lightﬁeld as ¯ I Hκ +1 = ¯ I Hκ + (cid:16) I L − ¯ I Lκ (cid:17) (14)The IBP algorithm is initiated by setting ¯ I H = ˆ I H and theiterative procedure terminates when κ = K . It was observedthat signiﬁcant improvements were achieved in the ﬁrst fewiterations and we therefore set K = 10 in our experiments.The restored light ﬁeld following iterative back-projection istherefore set to ¯ I H = ¯ I HK . XPERIMENTAL R ESULTS

The experiments conducted in this paper use both syntheticand real-world light ﬁelds from publicly available datasets.We use 98 light ﬁelds from the EPFL [36], INRIA and

3. INRIA dataset: https://goo.gl/st8xRt

HCI for training. We conducted the tests using light ﬁeldsfrom the INRIA and Stanford datasets. We use the Stanforddataset in this evaluation since it has disparities signiﬁcantlylarger than both INRIA and EPFL light ﬁelds, which werecaptured using plenoptic cameras. Moreover, unlike the HCIdataset, the Stanford light ﬁelds capture real world objects.We therefore use this dataset to assess the generalizationability of the algorithms considered in this experiment tolight ﬁelds which are captured using camera sensors whichdiffer from the ones used for training. While the sub-aperture images of the EPFL, HCI and Stanford datasets areavailable, the light ﬁelds in the INRIA dataset were decodedusing the method in [37] as mentioned on their website. Inall our experiments we consider a × array of sub-apertureimages. For computational purposes, the high-resolutionimages of the Stanford dataset were downscaled such thatthe lowest dimension is set to 400 pixels. The high-resolutionimages of the other datasets were kept unchanged, i.e . × for the HCI light ﬁelds and × for bothEPFL and INRIA light ﬁelds. Unless otherwise speciﬁed, thelow-resolution light ﬁelds were generated by blurring eachhigh-resolution sub-aperture image with a Gaussian ﬁlterusing a window size of 7 and standard deviation of 1.6,down-sampled to the desired resolution and up-scaled backto the target resolution using bi-cubic interpolation. Unlessotherwise speciﬁed, the iterative back-projection reﬁnementstrategy was disabled to permit a fair comparison to theother state of the art super-resolution methods considered.We compare the performance of our system againstthe best performing methods found in our recent work[5], namely the CNN based light ﬁeld super-resolutionalgorithm (LF-SRCNN) [11] and both linear subspace pro-jection based methods, PCA+RR and BM+PCA+RR [5].These methods were retrained using samples from the 98training light ﬁelds mentioned above using training pro-cedures explained in their respective papers. Moreover,given that the very deep super-resolution (VDSR) method[28] achieved state-of-the-art performance on single im-age super-resolution, we apply this method to restore ev-ery sub-aperture image independently. It is important tomention here that in our previous work we found thatBM+PCA+RR signiﬁcantly outperforms several other lightﬁeld and single-image super-resolution algorithms includ-ing [8], [10], [27], [38], [39], [40], [41]. Due to space con-straints we did not provide comparisons against the latterapproaches.The results in table 1 and table 2 compare these super-resolution methods in terms of PSNR for magniﬁcationfactors of × and × respectively. Moreover, the resultsin Figure 6 show the centre views of light ﬁelds restoredusing these methods when considering a magniﬁcation fac-tor of × . The VDSR algorithm [28] achieves on averagea PSNR gain of 0.33 dB over bicubic interpolation. Onemajor limitation of VDSR is that it does not exploit the lightﬁeld structure where each sub-aperture image is being re-stored independently. The PCA+RR algorithm [5] managesto restore more texture detail and is particularly effectiveto restore light ﬁelds with small disparities, which is the

4. HCI dataset: http://hci-lightﬁeld.iwr.uni-heidelberg.de/5. Stanford dataset: http://lightﬁeld.stanford.edu/ case of the INRIA light ﬁelds. This can be attributed tothe fact that the PCA+RR does not compensate for dispar-ities and therefore is not able to generalize to light ﬁeldscontaining disparities which were not considered duringtraining, which is the case for the Stanford light ﬁelds. Infact the Stanford light ﬁelds restored using PCA+RR containblocking artefacts. The BM+PCA+RR method [5] extendsthis method by aligning the patch-volumes using block-matching. This results in a more generalized method thatachieves average PSNR gains of 2.52 dB and 1.76 dB overbi-cubic on the INRIA and Stanford light ﬁelds respectively.Nevertheless, the light ﬁelds restored using this methodmay contain some artefacts especially near the edges.The LF-SRCNN method [11], which uses deep learn-ing to restore each sub-aperture image independently, wasfound to achieve a marginal gain over BM+PCA+RR (0.05dB). While, light ﬁelds restored using LF-SRCNN gener-ally contain less artefacts compared to BM+PCA+RR, theyrisk to restore light ﬁelds which are angularly incoherent.Nevertheless, our method achieves the best performanceachieving an overall gain of 0.2 dB over the second-bestperforming algorithm LF-SRCNN. This performance gainis more evident in the subjective results illustrated in Figure6 where the restored light ﬁelds are sharper and are visuallymore pleasing.As mentioned in section 3.4, one problem with theproposed method is that the inverse warping is unableto perfectly restore the original disparities of the lightﬁeld. Nevertheless, the results in tables 1, 2 and Figure 6clearly show that our proposed method outperforms theother schemes even without the use of the iterative back-projection reﬁnement strategy.In order to fairly assess the contribution of iterative back-projection, we apply it as a post process for the two bestperforming methods, namely LF-SRCNN and our proposedscheme LR-LFSR. Tables 3 and 4 show the performance ofthe two best performing methods with and without iterativeback projection as a post-processing step at magniﬁcationfactors of × and × respectively. It can be immediatelynoticed that IBP signiﬁcantly improves the quality of bothmethods. Nevertheless, our method which uses iterativeback projection as a post-processing step achieves the bestperformance achieving PSNR gains of 0.41 dB and 0.31 dBat magniﬁcation factors of × and × respectively over LF-SRCNN followed by iterative back projection. It is importantto notice that while the performance gain of our methodover LF-SRCNN without IBP is around 0.23 dB and 0.12dB at magniﬁcation factors of × and × respectively, thisgain roughly doubles when both use IBP as a post pro-cessing step. This indicates that since LF-SRCNN processeseach view independently, IBP only corrects inconsistenciesbetween the low-resolution and restored light ﬁelds. Apartfrom this distortion, LR-LFSR-IBP corrects the distortionscaused by the inverse warping process which provides lightﬁelds which are more visually pleasing and with smoothertransitions across views. Supplementary multimedia ﬁlesuploaded on ScholarOne show some sample restored lightﬁeld. Supplementary material is available on the project’swebsite while the code of the LR-LFSR will be made

6. https://goo.gl/8DDsDi available upon publication. R EFERENCES [1] R. Ng, M. Levoy, M. Br´edif, G. Duval, M. Horowitz, andP. Hanrahan, “Light Field Photography with a Hand-HeldPlenoptic Camera,” Tech. Rep., Apr. 2005. [Online]. Available:http://graphics.stanford.edu/papers/lfcamera/[2] B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez,A. Barth, A. Adams, M. Horowitz, and M. Levoy, “Highperformance imaging using large camera arrays,”

ACM Trans.Graph. , vol. 24, no. 3, pp. 765–776, Jul. 2005. [Online]. Available:http://doi.acm.org/10.1145/1073204.1073259[3] G. Wu, B. Masia, A. Jarabo, Y. Zhang, L. Wang, Q. Dai, T. Chai, andY. Liu, “Light ﬁeld image processing: An overview,”

IEEE Journalof Selected Topics in Signal Processing , vol. PP, no. 99, pp. 1–1, 2017.[4] C.-K. Liang and R. Ramamoorthi, “A light transport frameworkfor lenslet light ﬁeld cameras,”

ACM Trans. Graph. , vol. 34,no. 2, pp. 16:1–16:19, Mar. 2015. [Online]. Available: http://doi.acm.org/10.1145/2665075[5] R. A. Farrugia, C. Galea, and C. Guillemot, “Super resolutionof light ﬁeld images using linear subspace projection of patch-volumes,”

IEEE Journal of Selected Topics in Signal Processing ,vol. PP, no. 99, pp. 1–1, 2017.[6] A. Levin, W. Freeman, and F. Durand, “Understanding cameratrade-offs through a bayesian analysis of light ﬁeld projections,”in

European Conference on Computer Vision (ECCV) , Oct. 2008.[7] T. E. Bishop and P. Favaro, “The light ﬁeld camera: Extended depthof ﬁeld, aliasing, and superresolution,”

IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 34, no. 5, pp. 972–986, May2012.[8] K. Mitra and A. Veeraraghavan, “Light ﬁeld denoising, light ﬁeldsuperresolution and stereo camera based refocussing using a gmmlight ﬁeld patch prior,” in , June 2012, pp.22–28.[9] S. Wanner and B. Goldluecke, “Variational light ﬁeld analysis fordisparity estimation and super-resolution,”

IEEE Transactions onPattern Analysis and Machine Intelligence , vol. 36, no. 3, pp. 606–619, March 2014.[10] Y. Yoon, H.-G. Jeon, D. Yoo, J.-Y. Lee, and I. S. Kweon, “Learn-ing a deep convolutional network for light-ﬁeld image super-resolution,” in

IEEE Conference on Computer Vision and PatternRecognition (CVPR) , June 2015, pp. 57–65.[11] Y. Yoon, H. G. Jeon, D. Yoo, J. Y. Lee, and I. S. Kweon, “Light-ﬁeld image super-resolution using convolutional neural network,”

IEEE Signal Processing Letters , vol. 24, no. 6, pp. 848–852, June 2017.[12] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light ﬁeld cameras,”

ACM Transactionson Graphics (Proceedings of SIGGRAPH Asia 2016) , vol. 35, no. 6,2016.[13] G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Lightﬁeld reconstruction using deep convolutional network on epi,” in

IEEE Computer Society Conference on Computer Vision and PatternRecognition (CVPR) , July 2017.[14] T.-C. Wang, J.-Y. Zhu, N. K. Kalantari, A. A. Efros, and R. Ra-mamoorthi, “Light ﬁeld video capture using a learning-basedhybrid imaging system,”

ACM Transactions on Graphics (Proceedingsof SIGGRAPH) , vol. 36, no. 4, 2017.[15] Y. Wang, Y. Liu, W. Heidrich, and Q. Dai, “The light ﬁeld attach-ment: Turning a dslr into a light ﬁeld camera using a low bud-get camera ring,”

IEEE Transactions on Visualization and ComputerGraphics , vol. 23, no. 10, pp. 2357–2364, Oct 2017.[16] M. Levoy and P. Hanrahan, “Light ﬁeld rendering,” in

Proceedings of the 23rd Annual Conference on Computer Graphicsand Interactive Techniques , ser. SIGGRAPH ’96. New York,NY, USA: ACM, 1996, pp. 31–42. [Online]. Available: http://doi.acm.org/10.1145/237170.237199[17] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen,“The lumigraph,” in

Proceedings of the 23rd Annual Conference onComputer Graphics and Interactive Techniques , ser. SIGGRAPH ’96.New York, NY, USA: ACM, 1996, pp. 43–54. [Online]. Available:http://doi.acm.org/10.1145/237170.237200[18] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross,“Scene reconstruction from high spatio-angular resolution lightﬁelds,”

ACM Trans. Graph. , vol. 32, no. 4, pp. 73:1–73:12, Jul. 2013.[Online]. Available: http://doi.acm.org/10.1145/2461912.2461926 TABLE 1: PSNR using different light ﬁeld super-resolution algorithms when considering a magniﬁcation factor of × . Forclarity bold blue marks the highest and bold red indicates the second highest score. Light Field Name Bicubic PCA+RR [5] BM+PCA+RR [5] LF-SRCNN [11] VDSR [28] Proposed

Bee 2 (INRIA) 30.1673

Duck (INRIA) 23.5394

Framed (INRIA) 27.5974

Fruits (INRIA) 28.4907 31.7884

Mini (INRIA) 27.6332

Rose (INRIA) 33.5436

Chess (STANFORD) 30.2895 31.9292

Eucalyptus (STANFORD) 30.7865 32.4162

Lego Gantry (STANFORD) 27.6235 28.0729 28.7230

Lego Knights (STANFORD) 27.3794 27.8457

TABLE 2: PSNR using different light ﬁeld super-resolution algorithms when considering a magniﬁcation factor of × . Forclarity bold blue marks the highest and bold red indicates the second highest score. Light Field Name Bicubic PCA+RR [5] BM+PCA+RR [5] LF-SRCNN [11] VDSR [28] Proposed

Bee 2 (INRIA) 27.8623 31.2436 31.1886

Dist. Church (INRIA) 23.3138

Duck (INRIA) 22.0702 23.9844 23.8083

Framed (INRIA) 26.1627 28.0771

Fruits (INRIA) 26.5269 29.0908 29.1969

Mini (INRIA) 26.3035

Rose (INRIA) 31.7687

Chess (STANFORD) 27.3679 29.5207 29.6773

Eucalyptus (STANFORD) 29.2136

Lego Knights (STANFORD) 24.0182 24.4944 25.5768

TABLE 3: PSNR obtained with the best two methods at a magniﬁcation of × with and without iterative back projectionas a post process. Light Field Name Bicubic LF-SRCNN LF-SRCNN-IBP LR-LFSR LR-LFSR-IBP

Bee 2 (INRIA) 30.1673 33.8268 34.7257 33.4915

Dist. Church (INRIA) 24.3059 25.9930 26.7415 26.7502

Duck (INRIA) 23.5394 26.0713 27.293 26.3777

Framed (INRIA) 27.5974 30.0697 31.0592 30.5365

Fruits (INRIA) 28.4907 31.6820 32.6581 32.0789

Mini (INRIA) 27.6332 29.8941 30.5193 30.1601

Rose (INRIA) 33.5436 36.8245 37.4991 36.8416

Amethyst (STANFORD) 30.5227 32.2953 33.5662 32.2737

Bracelet (STANFORD) 26.4662 28.8858 30.6367 29.4046

Chess (STANFORD) 30.2895 32.1922 33.6715 32.6123

Eucalyptus (STANFORD) 30.7865 32.1989 33.0700 32.6205

Lego Gantry (STANFORD) 27.6235 29.8086 31.5071 29.8112

Lego Knights (STANFORD) 27.3794 29.5354 31.2148 29.3177 [19] Y. Taguchi, A. Agrawal, S. Ramalingam, and A. Veeraraghavan,“Axial light ﬁeld for curved mirrors: Reﬂect your perspective,widen your view,” in , June 2010, pp. 499–506.[20] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, andJ. Tumblin, “Dappled photography: Mask enhanced camerasfor heterodyned light ﬁelds and coded aperture refocusing,”

ACM Trans. Graph. , vol. 26, no. 3, Jul. 2007. [Online]. Available:http://doi.acm.org/10.1145/1276377.1276463[21] X. Jiang, M. L. Pendu, R. A. Farrugia, and C. Guillemot, “Lightﬁeld compression with homography-based low rank approxima-tion,”

IEEE Journal of Selected Topics in Signal Processing , vol. PP,no. 99, pp. 1–1, 2017.[22] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “Rasl: Robustalignment by sparse and low-rank decomposition for linearly correlated images,” in , June 2010, pp. 763–770.[23] B. K. Horn and B. G. Schunck, “Determining optical ﬂow,” Cam-bridge, MA, USA, Tech. Rep., 1980.[24] C. Liu, J. Yuen, and A. Torralba, “Sift ﬂow: Dense correspondenceacross scenes and its applications,”

IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 33, no. 5, pp. 978–994, May2011.[25] Y. Hu, R. Song, and Y. Li, “Efﬁcient coarse-to-ﬁne patch matchfor large displacement optical ﬂow,” in , June 2016, pp.5704–5712.[26] Y. Li, D. Min, M. S. Brown, M. N. Do, and J. Lu, “Spm-bp: Sped-uppatchmatch belief propagation for continuous mrfs,” in , Dec 2015, pp. Original Bicubic PCA+RR [5] BM+PCA+RR [5] LF-SRCNN [11] VDSR [28] Proposed

Fig. 6: Restored center view using different light ﬁeld super-resolution algorithms. These are best viewed in color and byzooming on the views.

IEEE Trans. Pattern Anal.Mach. Intell. , vol. 38, no. 2, pp. 295–307, Feb. 2016. [Online].Available: http://dx.doi.org/10.1109/TPAMI.2015.2439281[28] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolutionusing very deep convolutional networks,” in , June 2016, pp.1646–1654.[29] X. Glorot and Y. Bengio, “Understanding the difﬁculty oftraining deep feedforward neural networks,” in

Proceedings ofthe Thirteenth International Conference on Artiﬁcial Intelligence andStatistics (AISTATS-10)

Trans. Img. Proc. ,vol. 13, no. 9, pp. 1200–1212, Sep. 2004. [Online]. Available:http://dx.doi.org/10.1109/TIP.2004.833105[31] Z. Xu and J. Sun, “Image inpainting by patch propagation usingpatch sparsity,”

IEEE Transactions on Image Processing , vol. 19, no. 5,pp. 1153–1165, May 2010.[32] O. Frigo and C. Guillemot, “Epipolar plane diffusion: An efﬁcientapproach for light ﬁeld editing,” London, France, Sep. 2017.[33] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variationbased noise removal algorithms,”

Physica D: Nonlinear Phenomena

CVGIP: Graphical Models and Image Processing , Sept 2009, pp. 349–356.[36] M. Rerabek and T. Ebrahimi, “New light ﬁeld image dataset,” in

Int. Conf. on Quality of Experience , 2016.[37] D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding,calibration and rectiﬁcation for lenselet-based plenoptic cameras,”in ,June 2013, pp. 1027–1034.[38] T. Peleg and M. Elad, “A statistical prediction model based onsparse representations for single image super-resolution,”

IEEETransactions on Image Processing , vol. 23, no. 6, pp. 2569–2582, June2014.[39] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolutionusing deep convolutional networks,”

IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 38, no. 2, pp. 295–307, Feb2016.[40] K. Zhang, B. Wang, W. Zuo, H. Zhang, and L. Zhang, “Jointlearning of multiple regressors for single image super-resolution,”

IEEE Signal Processing Letters , vol. 23, no. 1, pp. 102–106, Jan 2016.[41] R. Timofte, V. De, and L. V. Gool, “Anchored neighborhoodregression for fast example-based super-resolution,” in , Dec 2013, pp. 1920–1927.

ONCLUSION

In this paper, we have proposed a novel spatial light ﬁeldsuper-resolution algorithm able to reconstruct high qualitycoherent light ﬁelds. We have shown that the informationin a light ﬁeld can be efﬁciently compacted by aligning thesub-aperture images using optical ﬂow followed by low-rank matrix approximation. The low rank approximation of TABLE 4: PSNR obtained with the best two methods at a magniﬁcation of × with and without iterative back projectionas a post process. Light Field Name Bicubic LF-SRCNN LF-SRCNN-IBP LR-LFSR LR-LFSR-IBP

Bee 2 (INRIA) 27.8623 31.3945 32.4662 31.2545

Dist. Church (INRIA) 23.3138 24.5874

Framed (INRIA) 26.1627 27.9157 28.9111 28.2954

Fruits (INRIA) 26.5269 29.2100 30.1651 29.5297

Mini (INRIA) 26.3035 28.1731 28.6895 28.4009

Rose (INRIA) 31.7687 34.3064 34.9356 34.3392

Amethyst (STANFORD) 29.0665 30.5971 31.5991 30.4628

Bracelet (STANFORD) 24.1221 26.1013 27.0010 26.2712

Chess (STANFORD) 27.3679 30.0279 31.1502 30.1485

Eucalyptus (STANFORD) 29.2136 30.9433 31.5263 31.1772

Lego Gantry (STANFORD) 24.6721 26.8054 27.8395 26.9466

Lego Knights (STANFORD) 24.0182 26.0771 27.6550 26.1358 the aligned light ﬁeld gives an embedding in a lower dimen-sional space which is super-resolved using deep learning.All aligned views of the high-resolution light ﬁeld can bereconstructed from the super-resolved embedding by simplelinear combinations. These views are then inverse warpedto restore the disparities of the original light ﬁeld. Holescorresponding to dis-occlusions or cracks resulting fromthe inverse warping are ﬁlled in using a novel diffusionbased inpainting algorithm which diffuses known pixels inthe EPI along dominant orientations computed in the low-resolution EPI.Extensive simulations show that the proposed methodmanages to generalize well, i.e. manages to successfullyrestore light ﬁelds whose disparities are considerably dif-ferent from those used during training. These results alsoshow that our proposed method is competitive and mostof the time superior to existing state-of-the-art light ﬁeldsuper-resolution algorithms, including a recent approachwhich adopts deep learning to restore each sub-apertureimage independently. One major limitation of the proposedscheme is that the inverse warping process is not able torestore the original disparities and produces some distortioncaused by rounding errors. We proposed here to use theclassical iterative back-projection as a post processing step.Simulation results clearly show the beneﬁt of using IBPas a post processing of the super-resolved light ﬁeld anddemonstrate that the proposed method with IBP achievesthe best performance, outperforming LF-SRCNN followedby IBP by 0.4 dB. A CKNOWLEDGMENTS

This project has been supported in part by the EU H2020 Re-search and Innovation Programme under grant agreementNo 694122 (ERC advanced grant CLIM).

Reuben A. Farrugia (S04, M09, SM17) receivedthe ﬁrst degree in Electrical Engineering fromthe University of Malta, Malta, in 2004, and thePh.D. degree from the University of Malta, Malta,in 2009. In January 2008 he was appointed As-sistant Lecturer with the same department andis now a Senior Lecturer. He has been in tech-nical and organizational committees of severalnational and international conferences. In partic-ular, he served as General-Chair on the IEEE Int.Workshop on Biometrics and Forensics (IWBF)and as Technical Programme Co-Chair on the IEEE Visual Communica-tions and Image Processing (VCIP) in 2014. He has been contributing asa reviewer of several journals and conferences, including IEEE Transac-tions on Image Processing, IEEE Transactions on Circuits and Systemsfor Video and Technology and IEEE Transactions on Multimedia. OnSeptember 2013 he was appointed as National Contact Point of theEuropean Association of Biometrics (EAB).