[PDF] Deep Anti-aliasing of Whole Focal Stack Using its Slice Spectrum

Abstract

The paper aims at removing the aliasing effects for the whole focal stack generated from a sparse 3D light field, while keeping the consistency across all the focal layers.We first explore the structural characteristics embedded in the focal stack slice and its corresponding frequency-domain representation, i.e., the focal stack spectrum (FSS). We also observe that the energy distribution of FSS always locates within the same triangular area under different angular sampling rates, additionally the continuity of point spread function (PSF) is intrinsically maintained in the FSS. Based on these two findings, we propose a learning-based FSS reconstruction approach for one-time aliasing removing over the whole focal stack. What's more, a novel conjugate-symmetric loss function is proposed for the optimization. Compared to previous works, our method avoids an explicit depth estimation, and can handle challenging large-disparity scenarios. Experimental results on both synthetic and real light field datasets show the superiority of the proposed approach for different scenes and various angular sampling rates.

Full PDF

DDeep Anti-aliasing of Whole Focal Stack Using Its Slice Spectrum

Yaning Li, Xue Wang, Guoqing Zhou, and Qing WangNorthwestern Polytechnical University [email protected], [email protected]

Abstract

The paper aims at removing the aliasing effects for thewhole focal stack generated from a sparse 3D light ﬁeld,while keeping the consistency across all the focal layers. Weﬁrst explore the structural characteristics embedded in thefocal stack slice and its corresponding frequency-domainrepresentation, i.e., the focal stack spectrum (FSS). Wealso observe that the energy distribution of FSS alwayslocates within the same triangular area under differentangular sampling rates, additionally the continuity ofpoint spread function (PSF) is intrinsically maintainedin the FSS. Based on these two ﬁndings, we propose alearning-based FSS reconstruction approach for one-timealiasing removing over the whole focal stack. What’s more,a novel conjugate-symmetric loss function is proposedfor the optimization. Compared to previous works, ourmethod avoids an explicit depth estimation, and can handlechallenging large-disparity scenarios. Experimental resultson both synthetic and real light ﬁeld datasets show thesuperiority of the proposed approach for different scenesand various angular sampling rates.

1. Introduction

Light ﬁeld imaging technology [13] enables digitalrefocusing at different focal planes after the time of capture.Basically this is performed by integrating a light ﬁeldover the angular domain, which corresponds to the sliceoperation in the frequency domain [22]. However, with asparse angular coverage, i.e. , the disparity between adjacentviews is more than one pixel [5], there will be signiﬁcantaliasing artifacts in the out-of-focus regions in the refocusedimages [28], as shown in Fig.1(a).To enhance visual quality, many approaches have beenproposed [8, 28, 18, 6, 20, 7] to remove the aliasingeffects based on view interpolation [6], depth-based ﬁltering[5], or multi-scale fusion [28]. However, since mostof them rely on depth estimations [19, 4], inaccuratedepth maps will cause severe degradation in anti-aliasingperformance. Moreover, existing methods aim at removing

Figure 1. Aliasing effects and aliasing-removed results. (a) Inputwith aliasing and the view count is × . (b) Anti-aliasling outputby our method. From top to bottom, refocused image at a certaindepth, focal stack slice along the red line, and corresponding FSS. aliasing artefacts from an individual refocused image,which is corresponding to a speciﬁc depth layer in thewhole focal stack. Without taking all the layers togetherinto consideration, they could not provide consistentenhancement over the focal stack (as shown in the secondrow of Fig.1 and Fig.12). Namely, along the focal directionin the focal stack, the PSF-continuity can not be maintainedwell. This will become more critical for the refocusedimages with large disparities and complex occlusions.In the paper, we focus on exploring the structuralcharacteristics embedding in the focal stack and itscorresponding Fourier spectrum. Different from EPIs (a 2Drepresentation for light ﬁelds in the spatial domain) wherethe slopes of the EPI lines vary with depths, for a givenlight ﬁeld, the FSSs for different depths share the samecone-shaped pattern (as shown in the Fig.3). In otherwords, the energy distribution of the FSS locates withinthe same triangular area. Furthermore, the PSF-continuityis intrinsically maintained in the FSS. These importantcharacteristics of the FSS make it possible to exploit auniﬁed anti-aliasing scheme for whole depth contents.The main contributions of the paper are,1 a r X i v : . [ ee ss . I V ] J a n ) Two important characteristics of thefrequency-domain representation for the light ﬁeldfocal stack are explored. The FSS could preserve thePSF-continuity and provide the same bounds of spectralsupport along the focal axis under different angularsampling rates.2) A deep FSS-based anti-aliasing algorithm is proposedto perform one-time aliasing removing for all the refocusedlayers and meanwhile preserve the consistency betweendifferent focal layers, only a rough depth range beingneeded.3) A robust conjugate-symmetric loss function is deﬁnedin the U-Net for the optimization.

2. Related Work

The 4D light ﬁeld L ( u, v, x, y ) [18, 10] records lightrays in a 3D space. So far many researches have beendone for analyzing the characteristics of the 4D light ﬁeldsampling and digital refocusing. Ng [21] pointed out thatthe spectrum of a light ﬁeld concentrated on a 3D manifoldand each focal image could be synthetised by applying aninverse Fourier transform to a 2D slice in the manifold.Shi et al . [25] found the sparsity of a light ﬁeld fromwhich a dense light ﬁeld is reconstructed by applying thesparse Fourier transform. Considering the regular 2D meshsampling structure of the light ﬁeld, Levin and Durand[17] analyzed the dimensional gap between the 3D focalstack and the light ﬁeld and proposed to inpaint the lightﬁeld spectrum from the focal stack. Isaksen et al . [13]re-parameterized the light ﬁeld as a 3D focal stack in thespatial domain. Dansereau et al . [7] extended the capabilityof refocus from one single depth layer to a volumetricrange of depths by replacing the slice operation with adepth-dependent band-pass ﬁlter.However, due to the limitation of the sensor size,there always exists a contradiction between the spatial andangular resolutions, which results in the aliasing artifactsin digital refocusing for angularly undersampled lightﬁelds [22, 26]. Considering the formation of aliasing effects, itis straightforward to perform anti-aliasing by angularsuper-resolution [14, 27]. Thus aliasing-removing requireseither abundant samples or appropriate ﬁlters in the spatialor frequency domains.

Spatial-domain methods.

Levoy and Hanrahan [18]proposed a preﬁltering to reduce the spatial artifacts. Chang et al . [6] proposed an anti-aliasing method by interpolatingangular samples within each sampling interval using depthinformation. Bishop et al . [3] eliminated the aliasing by fusing multiview information. Xiao et al . [28] furtheranalyzed the aliasing in the spatial domain. They ﬁrstdetected aliasing and then used a multi-scale fusion methodto remove the aliasing. Lin et al . [19] analysed thesymmetry characteristic of the light ﬁeld focal stack inthe spatial domain and proved it was possible to use usedepth-dependent light ﬁeld rendering to reduce aliasing.

Frequency-domain methods.

Isaksen et al . [13] ﬁrstproposed a frequency-planar light ﬁeld ﬁlter. Chai et al .[5] proposed a comprehensive analysis about the trade-offbetween the sampling density and depth resolution. Ng [21]suggested band limited ﬁltering in the frequency domainand slicing could effectively inhibit the aliasing effect.Based on focal stack and sparse collection of views, Levinand Durand [17] employed the focal manifold in derivationsof 2D deconvolution kernels in the 4D Fourier spectrum.Dansereau et al . [7] presented the linear volumetric focusfor light ﬁeld cameras and derived the regions of supportin the frequency domain. They employed a simple, linearsingle-step ﬁlter to combine information over the light ﬁeld.In summary, previous published methods focus on thedepth-dependent characteristics in the original spectrum oflight ﬁelds, which is prone to depth errors. Additionally,these methods tackle each refocused image individuallyso that they can not provide PSF-continuous anti-aliasingfocal stack (as shown in the Fig.12). Different fromthese methods, scenarios with different disparities sharethe same cone-shaped pattern in the proposed FSSrepresentation, supporting a uniﬁed depth-independentsolution and providing the PSF-continuous anti-aliasingfocal stack in one single pass.

3. Focal Stack Spectrum (FSS)

In this section, we ﬁrst elucidate the way to obtain theFSS from a light ﬁeld, and then analyse the characteristicsof the FSS. Without loss of generality, a 2D light ﬁeldEPI instead of the full 4D one is used here for betterdemonstration.

For better understanding, the notations used in the paperare given in Tab.1. E ( u, x ) denotes a 2D light ﬁeld,where u and x refer to the angular and spatial dimensions,respectively. E d ( u, x ) denotes the sheared EPI at speciﬁcdisparity d , E d ( u, x ) = E ( u, x + d ( u − u ref )) , (1) where u ref refers to the reference view . Once the shearedEPI with arbitrary disparity f ∈ [ d min , d max ] is integratedfor all views, the focal stack F ( f, x ) is formed, F ( f, x ) = (cid:90) E f ( u, x ) du = (cid:90) E ( u, x + f ( u − u ref )) du. (2) The central view is selected as the reference view in the paper. igure 2. Characteristic analyses of the EPI in the u - x space, the focal stack in the f - x space and the FSS in the ω f - ω x space. (a)-(c)Original and sheared EPIs at different disparities. (d) The focal stack without aliasing, where ratio is the ratio of down-sampling. (e)Aliased focal stack. (f) Corresponding FSS of (e). For better visualization, only one single image pixel P is considered here.Table 1. Notations used in the paperTerm Deﬁnition E ( u, x )

2D light ﬁeld EPI E d ( u, x ) Sheared EPI at the disparity du Angular coordinate x Spatial coordinate f Focal layer’s disparity F ( f, x ) Focal stack formed by E ( u, x ) d Disparity during the shearing process F ( ω f , ω x ) FSS

F T ( · ) Fourier transform operator u ref Reference view N u View number

Subsequently, the Fourier form of F ( f, x ) is F ( ω f , ω x ) = F T ( F ( f, x )) , (3) where F T ( · ) refers to the 2D Fourier transform operator.Speciﬁcally, the view number of the given light ﬁeld is N u . Fig.2(a) shows the EPI of a point with disparity d ∗ . TheEPI is sheared with d ∗ (in Fig.2(b)) according to Eqn.1.Since the EPI line is perpendicular to the x -axis, there is nodefocus blur or aliasing at point P in the refocused image,shown in the Fig.2(e). Then the original EPI is shearedwith d ∗ + α (see Fig.2(c)). It is noticed that the EPI lineis not perpendicular to x -axis and there will appear defocusblur or aliasing in the focal stack as shown in Fig.2(d) and(e) respectively. The radius of the defocus blur or aliasingincreases with the increasing of α and a triangle is formedin the focal stack slice. Given a 3D light ﬁeld with N u views, the apex angle ϕ of the triangle has two forms, ϕ = (cid:40) (cid:0) ( N u − (cid:1) Continuous focal stack , (cid:0) ∆ α ( N u − (cid:1) Discrete focal stack , (4) where ∆ α is the disparity gap between neighboring focalplanes during the construction of the focal stack.The diameter l of the defocus blur (or aliasing), i.e. , theinterval between the points P and P in Fig.2(e), can becalculated by l P ,P = α ( N u − . (5) Note that, the aliasing only occurs when α ( N u − > N u − holds, otherwise the defocus blur appears. In other words,as long as the scene has texture information, the aliasingalways exists in the refocused image as long as the shearedparameter | α | is large enough.Revisiting Eqn.1, it is found that the aliasingpoints P , P , ..., P come from the views u , u , ..., u respectively. The slope of each aliasing line P P i in Fig.2(e)can be computed by Slope ( P P i ) = (cid:40) u i − u ref Continuous focal stack , α ( u i − u ref ) Discrete focal stack . (6) According to the Eqns.4 and 6, it is found that in thefocal stack slice,a) The shape of defocus blur or aliasing line is determinedby the refocus parameter ∆ α and the view index, and isindependent of the scene depth.b) All aliasing lines from the same view have the sameslope (as shown the lines P P i in Fig2 (e)).3 a) (b) (c) Figure 3. Different angular sampling rates and representations fora light ﬁeld. From top to bottom: EPI and its spectrum, focal stackand its FSS. From left to right: (a) Intensive sampling with 121views. (b) 5 × downsampling. (c) 15 × downsampling. Given a focal stack formed from a N u -view light ﬁeld,there are N u frenquency lines in the FSS according tothe property of Fourier transformation [2] : each linecorresponds to a speciﬁed view and more views lead tomore lines. Consequently, the FSS has the followingproperties,a) The shape of the FSS is determined by the refocusparameter ∆ α and the view index and is independentof the scene depth.b) According to the property of Fourier transformation [9],the FSS is conjugate symmetric.Fig.3 shows the comparisons between the aliasing EPIspectrum and the aliasing FSS under different samplingrates. Taking a closer look at the EPI spectrum (2ndrow) and the FSS (the last row) in Fig.3, with the numberof views decreasing, more repeating areas appear in theEPI spectrum, while the structural distribution of the FSSremains. Additionally, there is a one-to-one correspondencebetween the line in the FSS and the view when ∆ α is ﬁxed.As shown in Fig.3(c) (the rightmost column), 9 views in theEPI correspond to 9 lines in the FSS.

4. Proposed Method

Aliasing effects are caused by insufﬁcient angularsampling. According to the analysis in Sec.3.2, there is acorrespondence between the number of views and the FSS,so the anti-aliasing problem could be tackled as a spectrumcompletion one. The whole pipeline of the proposedFSS-based anti-aliasing algorithm is shown in Fig.6. The Fourier transformation tells that the energy of all lines with thesame slope in the spatial domain will concentrate on a perpendicular linepassing through the origin.

U-NetU-NetU-NetU-Net Euler's Formula. cos sin ix e x i x = + Euler's Formula. cos sin ix e x i x = + Concat

Real Part

Imaginary Part

CNN LayersCNN Layers

Power spectrum

Phase angle Anti-aliasing FSS

Figure 4. The structure of network.Figure 5. U-Net architecture.

Speciﬁcally, the aliasing focal stack F a ( f, x ) is obtainedfrom the undersampled EPI using Eqn.2, then the Fouriertransform operator F T ( · ) is applied to obtain the FSS F a ( ω f , ω x ) . Finally, a CNN φ parameterized by σ is proposed to reconstruct the aliasing-removed FSS F ( ω f , ω x ) from the F a ( ω f , ω x ) , F g t is the ground truth.Parameters σ are optimized by arg min σ {(cid:107)F g t , φ σ ( F a ) (cid:107)} . (7) As shown in Fig.4, to deal with complex numberinputs, a dual-stream U-Net is designed. The powerspectrum and phase angle are ﬁrstly fed into twosub-networks respectively, then the features are combinedusing the Euler’s formula to obtain the real and imaginaryparts, which are concatenated into another network foroptimization. Fig.5 shows the details of the U-Net proposedin this paper.The loss function is, loss = (cid:107)F − F gt (cid:107) + λ · loss s , (8) where loss s constrains the conjugate symmetry of thereconstructed FSS. The scaler λ is set to 1.5 for balancingthe weights of two loss terms, loss s = 1 N C W N C − (cid:88) i =0 W − (cid:88) j =0 |F ( ω i , ω j ) − F ∗ ( − ω i , − ω j ) | , (9) where | · | refers to the norm of a complex number, ∗ indicates the standard conjugate operation on a complexnumber. N C and W are the number of refocus layers andthe image width respectively.The complete FSS-based deep anti-aliasing algorithm issummarized in Algorithm 1.4 igure 6. The pipeline of the proposed FSS-based anti-aliasing algorithm. Algorithm 1

The FSS-based deep anti-aliasing algorithm

Input:

Angularly undersampled 3D light ﬁeld L ( u, x, y ) , with N u views of H × W pixels. Output:

Anti-aliasing 3D focal stack. for y = 1 to H do Get the 2D light ﬁeld E ( u, x ) . Obtain focal stack slice F a ( f, x ) by Eqn.2. Get aliasing FSS F a ( ω f , ω x ) by Eqn.3. Reconstruct non-aliasing FSS F ( ω f , ω x ) with thedual-stream U-Net. Perform an inverse Fourier transform on F ( ω f , ω x ) . end for Get the anti-aliasing 3D focal stack for light ﬁeld.

5. Experimental Results

In this section, we report experimental results ofthe proposed FSS-based anti-aliasing algorithm on bothsynthetic and real light ﬁelds. The robustness ofthe proposed method is ﬁrstly analysed on light ﬁeldswith different sampling rates. Then, an ablationexperiment is conducted to verify the effectiveness of theconjugate-symmetric loss. Finally, both the quantitativeand qualitative comparisons with SOTAs are providedto demonstrate the advantages of the proposed method.Additional experiments on light ﬁelds captured from acamera array further verify the generalization of theproposed method.

To train and verify the proposed network, both thesynthetic and real light ﬁelds are used. For synthetic data,6 light ﬁelds are rendered using the POV-Ray [23], where4 for training and 2 for testing. For real data, the highangular resolution light ﬁeld dataset [11] is used, where10 for training and 2 for testing. Note that, only 121views are used. Additionally, the Stanford [1] and theDisney [15] light ﬁelds are used to verify the performanceof the proposed method on unseen light ﬁelds captured bya camera array. Tab.2 shows the resolution. Notice that,

Table 2. Experiment and data parameters of different light ﬁleds.

Data Rang of d Angular Res. Spatial Res.Syn. LF [-1.00,0.98] ×

121 526 × Real LF [11] [-1.00,0.98] ×

121 376 × Couch [15] [-2.60,-0.60] ×

101 628 × Church [15] [-1.45,-0.53] ×

101 670 × Lego [1] [-1.00,0.98] ×

17 1024 × Figure 7. Aliasing-removed results for different downsamplingsettings. From top to bottom: 5 × and 15 × downsampling. (a)Anti-aliasing on the focal stack. (b) Reconstructed FSS. (c) Errormap of the FSS. (d) Error map of the focal stack. the spatial resolutions of the couch and church light ﬁelds( × and × ) are resized in experiments.Please refer to the supplementary material for the selectionof light ﬁelds.In order to verify the capability of the proposed methodfor large disparities, we conduct experiments with differentangular sampling rates. At present, only single directiondisparity is concerned, so the 2D EPI image can beused to represent the input light ﬁeld. The details oftested light ﬁelds under 5 × and 15 × downsampling scalesare introduced in the supplementary material. For eachlight ﬁeld in the experiment, the focal stack F ( f, x ) isconstructed by performing refocusing operations 199 timeswhere ∆ α = 0 . . Tab.2 shows the ranges of refocusoperations( d ).The network converges after 150 epochs where eachepoch contains 30 iterations. The Adam optimizer [16]is used for iterative optimization. The learning rate isinitially set to e − . The ﬁrst and second moments ofthe gradients are set to 0.9 and 0.99 respectively to enableadaptive learning rates. The network is implemented usingthe Tensorﬂow framework with 7 GTX 1080Ti GPUs. Spectrum Domain . In this subsection, we demonstratethe performance of our approach with respect to different5 able 3. Average spectral energy loss on different test scenarios.

Dataset 5 × × Syn. LFs 2.03% 3.58%Real LFs [11] 2.50% 3.13%Lego [1] 0.56% (2 × )Couch [15] 2.71% (10 × )Church [15] 0.67% (10 × ) sampling rates. Comparing Fig.3(b) and (c) with Fig.7,we can see that, for different sampling rates, ourmethod achieves a preferable performance on anti-aliasingrendering in both spatial and frequency domains. It isimportant to note that the errors in the frequency domainmainlly come from the Direct Component (DC) of thespectrum. The main reason is that the energy of DC is muchlarger than other compenents [9], and deep learning is moreinclined to learn low-frequency components in an inputsignal map [24]. The differences in DC will cause colorinconsistency during the FSS reconstruction. To tackle thisproblem, we replace the DC of the output spectrum with theone of the input spectrum.Tab.3 shows the average spectral energy loss on differenttest scenarios. Comparing the last column of Tab.5 withTab.3, we ﬁnd that the average spectral energy loss plays animportant role in the PSNR while has little effect on theSSIM. According to[29], human eyes are more sensitiveto the structure similarity compared with the PSNR. Thisfeature is important especially for refocused images wherethe errors in defocus blur are hard to be distinguishedby people. So, a higher SSIM in Tab.5 could betterdemonstrate the advantages of the proposed method. Image Domain . Fig.8 shows the results of anti-aliasing indifferent focal images under 5 × and 15 × downsamplingsettings respectively. In the × downsampling part, theproposed method could remove aliasing on severe occludedobjects (the trees in the top row). The middle and bottomrows show the anti-aliasing results at different focal layersfor the same scene (Bicycle). The middle and bottom rowsshow the results for the Bicycle scene when the focuseddepth is out of the range of the scene depth and in the rangeof scene depth, respectively. Although the radius of defocusblur varies from the middle to bottom, the proposed methodcould eliminate the aliasing effects in the middle one andretains the sharp edges in the bottom one at the same time,which veriﬁes the PSF-continuity of the proposed method.In 15 × downsampling, although the aliasing is more seriousdue to the large disparity, our method can still obtain goodaliasing-removed results.Fig.9 shows the quantitative results of our methodacross focal layers under different downsampling settings.The relative PSNR is deﬁned as the absolute PSNRvalue minus the PSNR of the input focal layer image.As shown in Fig.9(b) and (d), the PSNR and SSIMﬂuctuate with refocusing depth. When there is less (a) (c)(b) do w n s a m p li ng  do w n s a m p li ng  Figure 8. Aliasing-removed results at different focal layers under5 × downsampling and 15 × downsampling. (a) Input image. (b)Ground truth. (c) Our result.Table 4. Average PSNR/SSIM under different loss functions. scene w/o. loss s w. loss s Tree (Syn. LF) 34.97/0.880 35.92/0.912Bicycle (Real LF) [11] 33.08/0.949 34.45/0.960 aliasing in the refocused image, the improvement regardingPSNR and SSIM is relatively limited (the bottom row inFig.8). Moreover, the varying trends of PSNR and SSIMdemonstrate that our approach maintains the continuity ofPSF in the focal stack.

Conjugate symmetry loss . To verify the effectivenessof the conjugate symmetry loss, an ablation experimentis conducted. Tab.4 shows the average PSNR/SSIM atdifferent focal layers under 15 × downsampling with orwithout the loss s . It is noticed that the PSNR and SSIMare improved after applying the conjugate symmetry loss inthe U-Net. Vertical EPI . All the above experiments are carried out inthe x − u dimension. Fig.10 shows the aliasing-removedresults in the y − v dimension on the Stanford light ﬁelds[1], from which we ﬁnd that our method is also effective inthe y − v direction of the light ﬁeld. Undersampling light ﬁeld . Finally, we did an experimenton HCI dataset [12]. In this set, the disparity betweenadjacent views is larger, which produces severe aliasingeffects when refocusing. As shown in Fig.1, whenrefocusing on the red cloth in the background, the beesand wooden balls in the foreground will be aliased. Theproposed method can eliminate the aliasing effects well.

In this subsection, we compare our method againsttwo SOTAs, Kalantari et al . [14] and Xiao et al . [28].6 igure 9. Quantitative results of our method across focal layers under different downsampling settings. Top row: the Tree scene. Bottomrow: the Bicycle scene. (a) Absolute PSNR. (b) Relative PSNR. (c) Absolute SSIM. (d) Relative SSIM.Figure 10. Anti-aliasing results of Lego [1] along the directionsof y − v . The ﬁrst row shows the refocus image at a certain depthand the second row shows the focal stack along the red line and thelast row displays the corresponding FSS. (a) Input with aliasing,the number of views is × ; (b) Anti-aliasling output by theproposed method. Tab.5 shows the average PSNR/SSIM on both synthetic(abbreviated in Syn.) and real light ﬁelds (Real for short)over all focus layers. It is noticed that the proposed methodoutperforms the SOTA methods. Qualitative comparisonson two test scenes are shown in Figs.11 and 12. Fig.13shows the quantitative comparisons of Fig.11 on each focuslayer.Fig.11 compares the anti-aliasing results under 15 × downsampling. It is generally accepted that aliasing resultsfrom sparse view sampling. We regard the method byKalantari et al . [14] as a view synthesis network for Table 5. Quantitative comparisons (PSNR/SSIM) with SOTAsunder different downsampling rates on synthetic and real lightﬁelds.

Kalantari [14] Xiao [28] OursSyn. LFs × × × × × × × eliminate aliasing. However, Kalantari et al . [14] doesnot perform well for view synthesis under large parallax.The edges of the image objects are distorted signiﬁcantly.Extensive disparities and complex occlusions also lead tosevere aliasing in the out-of-fucus regions. For the secondtest scene, the focal plane is located on the yellow stepahead, so signiﬁcant aliasing appears in the background.Although the method by Xiao et al . [28] also locates thealiasing area, it can not deal with the massive disparitysituation. Moreover, multi-scale image fusion can eliminatealiasing to a certain, however, the whole image will beconsequently blurred. For example, Xiao et al . [28] willincrease the number of pyramid layers to remove severealiasing, which will also result in the blurring of the entireimage space (as shown the ﬁrst row of Fig.11(d)). Ourmethod can not only locate but also eliminate these aliasingeffects.Fig.12 shows the results with different focal parametersunder 15 × downsampling setting. According to the analysisin Section 3.3, the focal stack slices have the symmetry7 b) Ours (c) Kalantari et al . (d) Xiao et al .(a) Input/GT InputGT PSNR:35.54

SSIM:0.966 PSNR:34.11SSIM:0.879PSNR:33.38

SSIM:0.813

PSNR:32.17SSIM:0.946 PSNR:31.99SSIM:0.811 PSNR:31.46

SSIM:0.899Input GT Figure 11. Anti-aliasing results by different methods under 15 × downsampling (refocused images and error maps). (a) Input focal imageand ground truth. The results by (b) our method, (c) Kalantari et al . [14] and (d) Xiao et al . [28]. Several local areas are zoomed in forbetter visualization. (c) Ours (d) Kalantari et al . (e) Xiao et al. (a) GT (b) Input Figure 12. Anti-aliasing results on the focal stack slice under 15 × downsampling. (a) Ground truth. (b) Input. The results by (c) ourmethod, (d) Kalantari et al . [14] and (e) Xiao et al . [28]. Several local areas are zoomed in for better visualization. property along the f -axis, and the symmetric structure issimilar for the scene points locating at different depthsspace (PSF-continuity). Maintaining this characteristicis one of the important indicators for evaluating theanti-aliasing effects. As shown in Fig.12, compared to othertwo methods, our method can not only eliminate aliasingbut also maintain this PSF-continuity. Quantitative resultsare shown in Fig.13. The PSNR values of Xiao et al . [28]are higher than those of our method on several focal layers,however, since Xiao et al . [28] process each focus layerimage separately and can not preserve the continuity ofthe PSF, the overall trend of its PSNR and SSIM curves ﬂuctuate along sampling rates. Results of light ﬁelds from camera array.

In order toverify the generalization performance of our algorithm, wetest our model on part of the Disney datasets [15] and lightﬁeld Lego from the Stanford dataset views [1]. Becausethe angular resolutions of Disney and the Stanford LFsdiffer from the proposed synthetic and real LFs (see Tab.3and 5), we set the sampling rates for these two datasetsas × and × to guarantee the same angular resolutionin the downsampling LFs (9 views). Fig.14 compares theanti-aliasing results in the Church LF. When the imageis refocused on the tree in the front, there are obvious8 igure 13. Quantitative comparisons on two test scenes for different focal layers. Top row: the synthetic scene. Bottom row: the realscene. The blue, red and yellow lines indicate the results by ours, Kalantari et al . [14] and Xiao et al . [28] respectively. (c) Ours (d) Kalantari et al . (e) Xiao et al . (a) GT (b) Input Figure 14. Anti-aliasing results by different methods on Disney datasets. (a) Ground truth. (b) Input focal image under 10 × downsampling.(c) our method, (d) Kalantari et al . [14], (e) Xiao et al . [28]. Several local areas are zoomed in for better visualization. aliasing effects in the non-focusing area. Our method cansigniﬁcantly remove the aliasing in the non-focusing areas,such as the white building in the distance. The averagePSNR and SSIM over whole aliasing-removed focal stackare listed in the last three rows of Tab.5. Currently, the proposed method could only process thelight ﬁeld with one angular dimension. For a light ﬁeldwith two angular dimensions, the contents from differentlines damage the linear structure in the focal stack (red boxin Fig.15), leading the characteristics of the spectrum bedamaged.

6. Conclusions and Future Work

In this paper, we propose an FSS-based anti-aliasingmethod for angularly undersampled light ﬁelds. The FSSpreserves the PSF-continuity and spectrum distribution

Figure 15. Focal stack and FSS of Lego datasets[1]. These focalstack and FSS are obtained by vertical-horizontal direction( u, v )and horizontal direction ( u ) views respectively. The ﬁrst rowshows the focal stack and the last row displays the correspondingFSS. (a) The number of views is 9 × × downsampling). (b)The number of views is 1 × × downsampling). References [1] The new stanford light ﬁeld archive. http://lightfield.stanford.edu/lfs.html .[2] J. Bigun and G. H. Granlund. Optimal orientation detectionof linear symmetry. In

IEEE ICCV , pages 433–438, 1987.[3] T. E. Bishop and P. Favaro. The light ﬁeld camera: Extendeddepth of ﬁeld, aliasing, and superresolution.

TPAMI ,34(5):972–986, 2011.[4] T. Broad and M. Grierson. Light ﬁeld completion using focalstack propagation. In

ACM SIGGRAPH 2016 Posters , pages1–2. 2016.[5] J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum. Plenopticsampling. In

ACM SIGGRAPH , pages 307–318. ACMPress/Addison-Wesley Publishing Co., 2000.[6] A.-C. Chang, T.-P. Sung, K.-T. Shih, and H. H. Chen.Anti-aliasing for light ﬁeld rendering. In

IEEE ICME , pages1–6. IEEE, 2014.[7] D. G. Dansereau, O. Pizarro, and S. B. Williams. Linearvolumetric focus for light ﬁeld cameras.

ACM TOG ,34(2):15, 2015.[8] T. Georgiev and A. Lumsdaine. Reducing plenoptic cameraartifacts. In

Computer Graphics Forum , volume 29, pages1955–1968. Wiley Online Library, 2010.[9] R. C. Gonzales and R. E. Woods. Digital image processing,2002.[10] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F.Cohen. The lumigraph. In

Proceedings of the 23rdannual conference on Computer graphics and interactivetechniques , pages 43–54, 1996.[11] M. Guo, H. Zhu, G. Zhou, and Q. Wang. Dense light ﬁeldreconstruction from sparse sampling using residual network.In

Springer ACCV , pages 1–14, 2018.[12] K. Honauer, O. Johannsen, D. Kondermann, andB. Goldluecke. 4D light ﬁeld dataset. http://hci-lightfield.iwr.uni-heidelberg.de/ ,2016.[13] A. Isaksen, L. McMillan, and S. J. Gortler. Dynamicallyreparameterized light ﬁelds. In

ACM SIGGRAPH , pages297–306. ACM Press/Addison-Wesley Publishing Co.,2000. [14] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi.Learning-based view synthesis for light ﬁeld cameras.

ACMTOG , 35(6):193:1–193:10, 2016.[15] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, andM. H. Gross. Scene reconstruction from high spatio-angularresolution light ﬁelds.

ACM TOG , 32(4):73:1–73:12, 2013.[16] D. P. Kingma and J. Ba. Adam: A method for stochasticoptimization. In

ICLR , 2015.[17] A. Levin and F. Durand. Linear view synthesis using adimensionality gap light ﬁeld prior. In

IEEE CVPR , pages1831–1838, 2010.[18] M. Levoy and P. Hanrahan. Light ﬁeld rendering. In

ACMSIGGRAPH , pages 31–42. ACM, 1996.[19] H. Lin, C. Chen, S. B. Kang, and J. Yu. Depth recovery fromlight ﬁeld using focal stack symmetry. In

IEEE ICCV , 2015.[20] A. Lumsdaine, T. Georgiev, et al. Full resolution lightﬁeldrendering.

Indiana University and Adobe Systems, Tech. Rep ,91:92, 2008.[21] R. Ng. Fourier slice photography. In

ACM TOG , volume 24,pages 735–744. ACM, 2005.[22] R. Ng.

Digital light ﬁeld photography . PhD thesis, Stanforduniversity, 2006.[23] POV-ray. .[24] N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin,F. Hamprecht, Y. Bengio, and A. Courville. On the spectralbias of neural networks. In

ICML , pages 5301–5310. PMLR,2019.[25] L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F. Durand.Light ﬁeld reconstruction using sparsity in the continuousfourier domain.

ACM TOG , 34(1):12:1–12:13, 2014.[26] J. Stewart, J. Yu, S. Gortler, and L. McMillan. A newreconstruction ﬁlter for undersampled light ﬁelds. In

ACMInternational Conference Proceeding Series . EurographicsAssociation/Association for Computing Machinery, 2003.[27] T.-C. Wang, J.-Y. Zhu, N. K. Kalantari, A. A. Efros,and R. Ramamoorthi. Light ﬁeld video capture usinga learning-based hybrid imaging system.

ACM TOG ,36(4):133:1–133:13, 2017.[28] Z. Xiao, Q. Wang, G. Zhou, and J. Yu. Aliasing detection andreduction scheme on angularly undersampled light ﬁelds.

IEEE TIP , 26(5):2103–2115, 2017.[29] Z Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli.Image quality assessment: from error visibility to structuralsimilarity.

IEEE TIP , 13(4):600–612, 2004. upplementary Material EPIs under different view downsampling rates

Fig.16 shows the detail of tested light ﬁeld under 5 × and15 × downsampling. (a) (b)(c)(d) Figure 16. EPIs under different view downsampling rates . (a)Reference view. (b) Original EPI. (c) 5 × downsampling. (d) 15 × downsampling. The selection of light ﬁelds

Fig.17 shows the reference view of our synthetic lightﬁeld datasets. In synthetic light ﬁeld datasets, we usethe tree scene and pot-cube scene for testing, because theocclusion relationship in these scenes are complex. In reallight ﬁeld datasets[11], the Bicycle and the Firehydrantare selected for testing due to the large disparity range,which could better verify the generalization of the proposedmethod under different scenes. In Disney datasets [15], theCouch and Church are selected since there is no motionin these two light ﬁelds and the large disparity. Besides,we also choose the StillLife light ﬁeld [12] for testing.Different from previous datasets which have dense views,the sampling is inadequate in the StillLife, i.e. , there isaliasing in the refocused image even using all of views. Theproposed method could eliminate the intrinsic aliasing inthe StillLife as shown in the Fig.1 of the paper.

More results

Please refer the attached video result.mp4 for thepipeline of the FSS reconstruction and more anti-aliasingresults.