Signal reconstruction via operator guiding
SSignal reconstruction via operator guiding
Andrew Knyazev
Mitsubishi Electric Research Laboratories (MERL)201 Broadway, 8th floorCambridge, MA 02139-1955Email: [email protected]
Alexander Malyshev
University of BergenDepartment of MathematicsPostbox 7803, 5020 Bergen, NorwayEmail: [email protected]
Abstract —Signal reconstruction from a sample using an or-thogonal projector onto a guiding subspace is theoreticallywell justified, but may be difficult to practically implement.We propose more general guiding operators, which increasesignal components in the guiding subspace relative to those in acomplementary subspace, e.g., iterative low-pass edge-preservingfilters for super-resolution of images. Two examples of super-resolution illustrate our technology: a no-flash RGB photo guidedusing a high resolution flash RGB photo, and a depth imageguided using a high resolution RGB photo.
I. I
NTRODUCTION
Super-resolution (SR) refers to techniques that reconstructhigh-resolution (HR) images from low-resolution (LR) images[1], [2], [3], [4], [5], determining high-frequency componentsand removing degradation caused by image acquisition in LRcameras. A single-image edge-preserving SR by interpolation,called the zooming problem in [6], [7], is not well-posed dueto an ambiguity of determining high-frequency components,needed to preserve edges in the reconstructed high-resolution(HR) image. To remove the ambiguity and make the SRproblem well-posed, one can introduce a guiding subspace,determined by frames, or via an action of an orthogonalprojection on it; see, e.g., [8], [9].Frame-less guided SR using an orthogonal projector on theguiding subspace is theoretically well justified, e.g., in [8],[9], but may be difficult in practice, where even the guidingsubspace itself may be not explicitly known. We extend theapproach of [8], [9] to guiding operators, which increasesignal components in the guiding subspace, relative to those ina complementary subspace. As noticed in [9], the traditionalTikhonov’s regularization may be substituted with pre- andpost-smoothing of sample-consistent reconstruction in caseof noisy samples. We adopt a similar approach and providealternative to [9] mathematical justification covering generalguiding operators, not necessarily orthogonal projectors.We illustrate our sample-consistent operator guided signalreconstruction with pre- and post-smoothing for imaging. AnHR image is reconstructed from a noisy LR image using anHR image of the same scene but in different modality asa guidance in setting up an edge-preserving denoising filter.Several authors study a corresponding application, where theLR image is a depth image and the HR guidance image is anRGB image of the same scene; e.g., see [10], [11], [12], [13]. II. N
OTATION AND PRIOR WORK
A sampled (degraded) signal vector y in signal processingis often represented by the linear model y = Ax + n, (1)where the vector x is the latent signal (image), n is a noise vec-tor, e.g., consisting of independent and identically distributedzero mean noise. The matrix A is typically a product of down-sampling (e.g., decimation) and blurring operators.The most frequently used restoration method is given byTikhonov’s regularization model, see, e.g., [4], [7], [14], min x (cid:107) Ax − y (cid:107) ω + ρR ( x ) , (2)where the functional R ( x ) is called a regularization term, and ρ > is a regularization parameter. The regularization term R ( x ) aims at bounding and smoothing the solution x of (2).The weighted norm (cid:107) · (cid:107) ω is defined by (cid:107) x (cid:107) ω = x T ωx , where ω is a positive definite matrix. If A = I is the identity matrix,then (2) describes denoising.Signal (image) processing often uses total variation due toits remarkable ability to preserve contours/edges of signals.The total variation term R ( x ) approximates (cid:107)∇ x (cid:107) ; see, e.g.,[7], [11], [14], [15]. Total variation filtering analogs can beset up in a framework of graph-based signal processing, e.g.,by setting R ( x ) = x T L ( g ) x , where L ( g ) is a graph Laplacianmatrix guided by a signal g , e.g., [16], that we define next.A signal (image) is interpreted as an intensity function on N vertices V of a weighted graph G = ( V, E, W ) consistingof a finite set V of vertices (e.g., representing image pixels)and a finite set E ⊂ V × V of edges ( i, j ) with typicallynonnegative (cf., [17], [18]) weights W ( i, j ) , which measuresimilarity between vertices i and j in the graph. The intensityvalues of the signal form the vector g = [ g , . . . , g N ] T , wherethe vertices V are arbitrarily numbered. The similarity weightsform a symmetric N × N graph adjacency matrix W = W ( g ) .Row-sums d = W N , where N is the vector of ones, of W define the diagonal N × N degree matrix D = diag( d ) of the graph. Desired for smoothing filters property d = 1 N ,i.e. D = I , of the graph adjacency matrix W = W ( g ) can beensured by scaling W with positive diagonal matrix multipliersvia Sinkhorn’s algorithm; e.g., [19]. Alternatively, W can besubstituted by the normalized graph adjacency matrix D − W ,although the latter is technically non-symmetric. a r X i v : . [ c s . C V ] M a y he symmetric and positive semidefinite graph Laplacianmatrix L = L ( g ) is then defined as L = D − W . In graph-based signal processing, eigenvectors of L serve as generaliza-tions of basis functions of the Discrete Cosine Transform. Ifthe guidance image g , used to determine L ( g ) , is aligned withan noisy image x and shares the same edges, edge-preservingdenoising of x can be performed using filters based on L ( g ) .For example, the multiplication W ( g ) x on x amplifies spectralcomponents in x corresponding to large eigenvalues of L andthus can be viewed as an approximate high-pass filter; cf. [20].With R ( x ) = x T L ( g ) x , the normal equations for theoptimization problem (2) become A T ω ( Ax − y ) + ρL ( g ) x = 0 , (3)where L ( g ) is the graph Laplacian operator with a guidancesignal g . A self-guided choice g = x is nonlinear; e.g., [16].Graph-based interpretations of denoising filters are com-mon. Classical bilateral and total variation filters for imagedenoising are usually constructed via explicit formulas for theweights W ( i, j ) on a priori determined graph edges E ; e.g.,[12], [16]. More recent guided filter [20] is defined directly viaexplicit description of its action W ( g ) x on a given image x ,while its edges E and formulas for the weights W ( i, j ) arethen derived for theoretical purposes only. Sophisticated filters,e.g., BM3D [21], are determined exclusively by functions thatimplement their action, so it may be difficult to obtain explicitformulas for their weights W ( i, j ) . Exact low-pass filters,where orthogonal projectors represent W and W N = 1 N ,also fit the graph-based framework, although some weights W ( i, j ) are negative.System (3) with the weight matrix ω = I + βL ( g ) , where β ≥ − is a parameter, is used in [19]. The resulting system ( A T [ I + βL ] A + ρL ) x = A T [ I + βL ] y is solved by the conjugate gradient (CG) method; see [19].Connecting Tikhonov’s regularization equation (2) to frame-based reconstruction, one can set ω = (cid:0) AA T (cid:1) + , where theoperation + denotes the Moore-Penrose pseudoinverse, whichgives A + = A T (cid:0) AA T (cid:1) + and turns (3) into A + ( Ax − y ) + ρL ( g ) x = 0 . (4)Denoting s = A + y and introducing the orthogonal projector S = A + A , we equivalently rewrite equation (4) as Sx − s + ρL ( g ) x = 0 , (5)which is the normal equation for Tikhonov’s regularization min x ∈ R n (cid:107) Sx − s (cid:107) + ρx T Lx, (6)—a particular case of optimization problem (2).The authors of [8], [9] investigate the case of problem (6),where L = I − W is an orthogonal projector. In particular,they prove that when the regularization parameter ρ vanishesunconstrained minimization (6) reduces to min x x T Lx , subject to Sx − s = 0 , (7) which can be interpreted as graph-harmonic sample-consistentsignal extension, since the quadratic form x T Lx can be viewedas energy of the signal x , defined by the graph Laplacian L .Since S is an orthogonal projector, we have Sx = s = Ss and the orthogonal decomposition x = s + ( I − S ) x helps toshow that the minimizer x in (7) solves the system ( I − S ) L ( I − S ) x = − ( I − S ) Ls, (8)where the matrix ( I − S ) L ( I − S ) is symmetric positivesemidefinite. The special structure of equation (8) allowsapplying CG method to the linear system ( I − S ) Lu = − ( I − S ) Ls, (9)with an initial approximation from the null-space Null ( S ) of S to iteratively approximate the solution u within the subspaceNull ( S ) . The sample-consistent reconstruction x in (7) is thengiven by the orthogonal sum x = s + u ; see [8], [9] for details.The authors of [21] propose solving a self-guided nonlinearversion of (7) via simple iteration x = s, x i +1 = s + ( I − S ) BM3D ( x i ) , (10)where BM3D ( x i ) is an application of the BM3D filter to x i .Post-processing by αx + (1 − α ) W x with α = 1 / ( ρ + 1) isproved in [9] to solve Tikhonov’s regularization problem (6).Since α ≈ − ρ for small ρ , the post-processing is approxi-mated by x − ρLx when ρ → . The corresponding argumentsin [9] rely on the assumptions that L = I − W and that W is an orthogonal projector. We provide an alternative analysis,dropping these assumptions, in the next section.III. T IKHONOV ’ S REGULARIZATION DEMYSTIFIED
Equation (5) implies ( I − S ) Lx = 0 , i.e. ( I − S ) L ( I − S ) x = − ( I − S ) LSx, (11)which differs from (8) only in the right-hand side, since thesample consistency Sx = s is not enforced in (5).In an orthonormal basis of R N , where S = (cid:20) I (cid:21) , L = (cid:20) L L L L (cid:21) , x = (cid:20) x x (cid:21) , s = (cid:20) s (cid:21) , equation (5) takes the following block form, (cid:20) ρL ρL ρL I + ρL (cid:21) (cid:20) x x (cid:21) = (cid:20) s (cid:21) . (12)Assuming that the block L is invertible, we solve the equiva-lent to (11) equation L x = − L x for x = − L − L x and then for x we obtain the following equation ( I + ρL/L ) x = s , where L/L = L − L L − L , is the Schur complement of the block L in the matrix L .If ρ → , we conclude that x → s and x → − L − L s ,so that the limit vector x solves (7) and (8), where x = s ,i.e. Sx = s . For small ρ , we have x ≈ s − ρL/L s , i.e. thesolution x of Tikhonov’s regularization equation (5) dependslinearly on ρ . The minuend L s in L/L s corresponds tothe vector SLs . Thus, rather than solving (5) directly, for small ρ one can try sample-consistent reconstruction via solving (7)or (8), but with an a priori denoised sampled signal s − ρSLs .V. T HE PROBLEM AND PROPOSED ALGORITHMS
We are given two images of the same scene, but in differentmodalities. The first image y has low resolution and maybe noisy. The second image g is of high resolution and isnoise-free. The images are registered (or aligned) by meansof a linear downsampling transform, that is, there is a well-conditioned matrix A such that the images Ag and y arealigned. We want to reconstruct an image x of the sameresolution as g from the image y so that the image g serves asa guidance image in the regularization term of the Tikhonov’sregularization reconstruction model (4).In contrast to the traditional approach directly solving (4),we use our arguments above to take the sample-consistentreconstruction approach (7) from [8] as a starting point, butfor a more general case where the Laplacian L is an arbitrarypositive semi-definite operator, not necessarily an orthogonalprojector. We notice that the proposed in [8], [9] reduction of(7) to (9) also works for this general case, so we solve (9).Noisy LR samples y should evidently be smoothed prior tocomputing the sample-consistent HR reconstruction x by (9).Dealing with noise, we always pre-process y via denoising, byanalogy with y − ρALA + y proposed in Section III. We alsofind it helpful in some tests to post-process our sample-consistent reconstruction x , similar to subtracting x − ρLx highly-oscillatory contributions described in Section II.Let CG m ( A , v ) denote a function, which implements m iterations of CG to solve the equation A ( u ) = v with a givenlinear operator A ( u ) . The main cost per iteration is the cost ofevaluation of A ( u ) . Our reconstruction method is as follows. Algorithm 1
Operator guided super-resolution by CG
Input: sample y , downsampling operator A ,guiding operator L , number of iterations m . Output: super-resolution reconstruction x of y :Define operator A ( u ) = ( I − A + A ) Lu .Choose an initial approximation x satisfying Ax = y .Compute x = x − CG m ( A , A ( x )) .The level x T N of the DC-component N in the recon-structed HR signal x can be adjusted during post-processingto match that in the LR sample y , if necessary.Compared to [8], where W and L = I − W are assumed tobe orthogonal projectors, our approach allows choosing moregeneral smoothing filters W , but there are still limitations: • Negative entries in the filter matrix W are allowed, butthe resulting Laplacian L = D − W needs to be symmet-ric positive semidefinite, in order for the minimization ofits quadratic form x T Lx in (7) to make sense, and tosatisfy assumptions for CG convergence in Algorithm 1. • DC-invariant filters, i.e. satisfying W N = 1 N , lead to D = I and thus to the trivial construction of L = I − W ,such as, e.g., the image guided filter of [20], implementedin the MATLAB Image Processing Toolbox, and the totalvariation filter presented in [16]. Otherwise, the filter DC-component evaluation d = W N is needed to determine D = diag ( d ) in L = D − W . • Iterative filters are allowed, e.g., a DC-invariant smooth-ing filter W , satisfying W N = 1 N , generates the DC-invariant polynomial filter P n ( I − W ) where P n ( · ) isa polynomial of degree n , satisfying P n (1) = 1 , forexample, P ( W ) = W . However, the polynomial P n ( · ) needs to remain fixed during the CG iterations. • Self-guided filters lead to nonlinear operators A ( · ) , re-quiring special care, such as proposed in [16].V. N UMERICAL EXPERIMENTS a) Super-resolution for flash-no flash images:
We havecarried out numerical tests in MATLAB with two reg-istered color images from the MATLAB own directory, toysflash.png and toysnoflash.png of the size × , taken respectively with and without flash. Thefirst image is our HR guidance image. We downsample thesecond image by factor in both dimensions choosing every -th pixel with the command image(1:4:end,1:4:end) .The result of downsampling is our LR image. Both images arescaled so that their intensities lie in the range [0 , . Figures 1and 2 display the guidance and the ground truth images.The two images of the same scene have different modalitiesowing to big differences in color representations of the objects.For example, the yellow ball looks white in the flash image.To produce a noisy downsampled LR image, a Gaussiannoise with the default parameters, zero mean and variance . , has been added to the LR image. The degraded image isdisplayed in Figure 3. We pre-smooth the noisy LR image bythe image guided filter with the default parameters and thenapply Algorithm 1 to the pre-smoothed image.We use a single application of the guided filter function imguidedfilter from the MATLAB Image ProcessingToolbox as the smoothing filter W ( g ) for color images.The function parameters are as follows: width of neighbor-hoods is , the smoothing value is − .The relative residual is − after m = 20 CG iterations,which translates into applications of imguidedfilter .Figure 4 shows the reconstructed image, which has the samedesired colors as the ground truth image. The PSNR referredto the ground truth HR image equals . . b) Super-resolution of a depth image: Algorithm 1 hasbeen also evaluated on the
Art image set from the simulatedMiddlebury 2007 data sets extensively used in [11]. The guid-ance image is the gray component of the HR RGB image
Art in Figure 5. The LR noisy depth image is shown in Figure 6.The upsampling shown in Figure 7 has been computed by thecode used in [11], which implements Tikhonov’s regularizationby the generalized total variation.To produce Figure 8 we use the guided Total Variation(TV) filter described in [16] as the guided smoothing filterin Algorithm 1 as well as during LR pre-smoothing and HRpost-smoothing. The smoothing parameter used is − at allthree stages, i.e. inside Algorithm 1, as well as for the pre-(and post-) smoothing, with 1/480/850 function evaluations,correspondingly. The number m of CG iterations is toachieve the relative residual · − . ig. 1. HR guidance image toysflash .Fig. 2. HR ground truth image toysnoflash . Comparison Figure 7 has a smaller PSNR, but a bit lessnoisy and has sharper edges, than our Figure 8, as thegeneralized TV filter used in [11] is more powerful smoothercompared to the TV filter from [16] that we use in these tests.However, Figure 7 has noticeable artifacts, clearly comingfrom sharp edges in RGB Figure 5, which are absent in LRFigure 6. In our approach, sharp edges in the guidance image g only affect the filter weights and thus cannot possibly polluteour reconstruction, as confirmed in Figure 8.VI. C ONCLUSION
We propose algorithms for guided iterative reconstruction ofsignals, illustrated with reconstruction of a higher resolutionimage from a lower resolution noisy sample. Guiding is givenby a smoothing filter and plays the role of regularization. Leastsquares minimization of the data term (cid:107) Ax − y (cid:107) is substitutedby the equality constraint Ax = y , allowing one to controlsample consistency and eliminating the problem of choosingthe regularization parameter in Tikhonov’s regularization. Fig. 3. Noisy LR image.Fig. 4. Our reconstructed image, PSNR = 24.32.
Noisy lower resolution samples are pre-smoothed. Theobtained sample-consistent high resolution reconstruction canbe post-smoothed, if needed. Iterations are based on conjugategradients, giving the optimal performance. General smooth-ing guidance filters allow flexibility in designing reconstruc-tions with desirable properties. Initial numerical experimentsdemonstrate feasibility of the proposed technology for super-resolution for flash-no flash images and for depth reconstruc-tion guided by RGB images.R
EFERENCES[1] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution imagereconstruction: a technical overview,”
IEEE Signal Processing Magazin ,vol. 20, no. 3, pp. 21–36, 2003.[2] S. Farsiu, D. Robinson, M. Elad, and P. Milanfar, “Advances andchallenges in super-resolution,”
International J. Imaging Systems andTechnology , vol. 1420, pp. 47–57, 2004.[3] M. Protter, M. Elad, H. Takeda, and P. Milanfar, “Generalizing thenon-local means to super-resolution reconstruction,”
IEEE Trans. ImageProcess. , vol. 18, no. 1, pp. 36–51, 2009.[4] P. Milanfar,
Super-resolution Imaging , CRC Press, USA, 2010.ig. 5. High resolution RGB × image Art .Fig. 6. Low resolution noisy depth × image Art .[5] Y. Romano, J. Isidoro, and P. Milanfar, “RAISR: Rapid and accurateimage super resolution,”
IEEE Trans. Computational Imaging , vol. 3,no. 1, pp. 110–125, 2017.[6] A. Chambolle, “An algorithm for total variation minimization andapplications,”
J. Math. Imaging and Vision , vol. 205, pp. 89–97, 2004.[7] A. Chambolle, V. Caselles, D. Cremers, M. Novaga, and T. Pock,“An introduction to total variation for image analysis,” in
TheoreticalFoundations and Numerical Methods for Sparse Recovery , M. Fornasier,Ed., vol. 5, pp. 263–340. Radon Series on Comp. Appl. Math., 2010.[8] A. Gadde, A. Knyazev, D. Tian, and H. Mansour, “Guided signalreconstruction with application to image magnification,” in
IEEE GlobalConf. Signal and Information Processing , 2015, pp. 938–942.[9] A. Knyazev, A. Gadde, H. Mansour, and D. Tian, “Guided signalreconstruction theory,” arXiv:1702.00852, 2017.[10] M. Y. Liu, O. Tuzel, and Y. Taguchi, “Joint geodesic upsampling ofdepth images,” in
IEEE Conference on Computer Vision and PatternRecognition , June 2013, pp. 169–176.[11] D. Ferstl, C. Reinbacher, R. Ranftl, M. Ruether, and H. Bischof, “Imageguided depth upsampling using anisotropic total generalized variation,”in
IEEE Int. Conf. Computer Vision (ICCV) , 2013, pp. 993–1000.[12] Y. Wang, A. Ortega, D. Tian, and A. Vetro, “A graph-based joint bilateralapproach for depth enhancement,” in
IEEE Int. Conf. Acoustic, Speech
Fig. 7. Upsampled depth × image Art , using [11]. PSNR = . .Fig. 8. Upsampled depth × image Art . Our result. PSNR = . . and Signal Processing (ICASSP) , 2014, pp. 885–889.[13] U. S. Kamilov and P. T. Boufounos, “Motion-adaptive depth super-resolution,” IEEE Trans. Image Process. , vol. 26, pp. 1723–1731, 2017.[14] F. ˘Sroubek, J. Kamenick´y, and P. Milanfar, “Superfast superresolution,”in , 2011, pp. 1177–1180.[15] M. Lebrun, M. Colum, A. Buades, and J. M. Morel, “Secrets of imagedenoising cuisine,”
Acta Numerica , vol. 21, pp. 475–576, 2012.[16] A. Knyazev and A. Malyshev, “Accelerated graph-based nonlineardenoising filters,”
Procedia Comp. Sci. , vol. 80, pp. 607–616, 2016.[17] A. Knyazev, “Signed Laplacian for spectral clustering revisited,”arXiv:1701.01394, 2017.[18] A. Knyazev, “Edge-enhancing filters with negative weights,” in
IEEEGlobal Conf. Signal and Information Processing , 2015, pp. 260–264.[19] A. Kheradmand and P. Milanfar, “A general framework for regularized,similarity-based image restoration,”
IEEE Trans. Image Process. , vol.23, no. 12, pp. 5136–5151, 2014.[20] K. He, J. Sun, and X. Tang, “Guided image filtering,”
IEEE Trans.Pattern Anal. Machine Intel. , vol. 35, no. 6, pp. 1397–1409, 2013.[21] A. Danielyan, A. Foi, V. Katkovnik, and K. Egiazarian, “Spatiallyadaptive filtering as regularization in inverse imaging: compressivesensing, super-resolution, and upsampling,” in