Light Weight Color Image Warping with Inter-Channel Information
LLIGHT WEIGHT COLOR IMAGE WARPING WITH INTER-CHANNEL INFORMATION
Chuangye Zhang, YanNiu, Tieru Wu, Ximing Li
ABSTRACT
Image warping is a necessary step in many multimedia ap-plications such as texture mapping, image-based rendering,panorama stitching, image resizing and optical flow compu-tation etc. Traditionally, color image warping interpolation isperformed in each color channel independently. In this pa-per, we show that the warping quality can be significantly en-hanced by exploiting the cross-channel correlation. We de-sign a warping scheme that integrates intra-channel interpo-lation with cross-channel variation at very low computationalcost, which is required for interactive multimedia applicationson mobile devices. The effectiveness and efficiency of ourmethod are validated by extensive experiments.
Index Terms — Image warping, inter-channel correlation,Laplacian filtering, image enhancement
1. INTRODUCTION
Image warping is fundamental for a variety of multimedia ap-plications such as image resizing (e.g, [1]), texture mapping(e.g., [2]), image-based rendering (e.g., [3]), panorama stitch-ing (e.g., [4]), stereo reconstruction (e.g., [5]), optical flowcomputation (e.g., [6]), to name but a few. Image warpingcan be briefly described as transforming a source image I into a target image I under a geometric mapping H , whichcan be linear, affine, perspective, non-parametric etc.[7]. Par-ticularly, assuming I has M rows and N columns, each pixel ( i, j ) in the target image lattice [1 M ] × [1 N ] has a cor-respondence image point [ x, y ] = H − ([ i, j ]) in the sourceimage, from which the intensity value I [ x, y ] is assigned to I [ i, j ] . The central problem of image warping is to estimate I [ x, y ] from pixels in the vicinity of [ x, y ] , which are gen-erally not on the integer grid, and hence the true intensity I [ x, y ] is not immediately available.The most popular estimation techniques for image warp-ing in real practice are nearest neighbour, bilinear and bicubic(spline) interpolation schemes [8], which do not adapt wellto irregular image regions. Therefore, academic research ef-fort has been mainly dedicated to edge guided interpolation,such as anisotropic filtering (e.g., [9], [10]) and variationalregularization (e.g., [11]). Particularly on the sub-topic ofSignal Image Super-resolution, i.e., in the special case that H degenerates to a scaling function, mathematical modelssuch as compressive sensing (e.g., [12]), dictionary learning (e.g., [13]) and Convolutional Neural Networks (CNN, e.g.,[1]) have also been intensively investigated.Although numerous image warping schemes have beenproposed, they are mostly designed for monochrome images.This means that on warping color images, these methodsshould be applied to each channel independently (e.g., [14])or to the luminance channel only (e.g.[15]), without exploit-ing the cross-channel correlation. The only exception is end-to-end CNNs that take RGB images as input. In this situa-tion, the color correlation is captured during the training pro-cess, but the learned correlation is CNN specific rather thanbeing general. In contrast, cross-channel correlation is thor-oughly studied for Image Demosaicking, which recovers thefull RGB images from subsampled color channels, with onlyone primary color component available at each pixel [16]. Al-though each channel can be recovered by merely using theintra-channel interpolation, it is commonly known that inter-channel correlation information can significantly improve thedemosaicking accuracy.In this paper, we integrate intra-channel image warpingwith cross-channel correlation, and show that our approachsignificantly improve the warping accuracy at very low cost.The proposed integration scheme is inspired by the Malvar-He-Cutler High Quality Linear Interpolation (HQLI) demo-saicking method [17], which is probably the fastest demo-saicking algorithm in the literature (except the Nearest Neigh-bour and Bilinear interpolation), but its trade-off between ac-curacy and speed is superior to many sophisticated algorithms[18]. We first examine the potential of cross-channel corre-lation for image warping, by applying HQLI to postprocesscolor images upsampled with intra-channel interpolation. Wethen generalize this combination to formulate general colorimage warping. The performance of our method is assessedin the scenario of image super-resolution, as it is the most ac-tive sub-area of image warping. Despite its very low cost, theproposed algorithm achieves comparable accuracy to state ofthe art algorithms at about times faster speed. Compared toCNN-based methods, our model has the flexibility of resizingthe image to arbitrary size. These features are desirable forreal time interactive applications on multimedia devices thathave limited computation power (e.g., mobile smart phones). a r X i v : . [ c s . C V ] D ec . INTER-CHANNEL CORRELATION GAIN Image warping is generally implemented by backward map-ping. Given H the geometric mapping from the source image I to the target image I , and H − be its inverse mapping, H − maps the coordinate frame of I back to the coordinateframe of I . Image I is generated in such a way that, foreach pixel ( i, j ) of I , the intensity value I [ i, j ] is copiedfrom I [ x, y ] , where (cid:20) xy (cid:21) = H − (cid:20) ij (cid:21) . (1)As x and y are generally not integers, I [ x, y ] has to be es-timated from the surrounding pixels. Let m = (cid:98) x (cid:99) and n = (cid:98) y (cid:99) be the floor integer parts of x and y , and define s = x − m , t = y − n . For example, the bilinear warpingscheme estimates I [ i, j ] by ˆ I [ x, y ] = (1 − s )(1 − t ) I [ m, n ] + stI [ m + 1 , n + 1]+ s (1 − t ) I [ m + 1 , n ] + (1 − s ) tI [ m, n + 1] . (2)Today, multimedia images are usually chromatic rather thanbeing grayscale. Therefore, I and I generally have RGBthree channels. Typically, bilinear warping for color imagesis to replace I in Eq.2 with its color channels R , G and B , to estimate R [ i, j ] , G [ i, j ] and B [ i, j ] , without cross-channel interaction.The importance of cross-channel correlation is well in-vestigated for the problem of Image Demosaicking, entailedin current consumer-level digital cameras, which commonlyuse Color Filter Arrays (CFA) to sample one of the RGB com-ponents for each pixel. Fig.1 shows an example of the Bayerpattern CFA, adopted for most digital cameras. (a) RGGB (b) BGGR(c) GRBG (d) GBRG Fig. 1 . An illustration of the Bayer pattern for CFA.Define R , G and B to be the sets that collect the pixelswhere the original red, green and blue values are captured respectively. Malvar-He-Cutler propose estimating the miss-ing color values by a weighted combination of the local intra-channel average and the Laplacian of another channel [17].For example, at a pixel [ m, n ] ∈ R , the green value G [ m, n ] is estimated by G [ m, n ] ≈ ¯ G + α ∆ R [ m, n ] , (3)where constant α is learned from training images, ¯ G standsfor the average of G [ m, n − , G [ m, n + 1] , G [ m − , n ] and G [ m + 1 , n ] , and ∆ R [ m, n ] = R [ m, n ] −
14 ( R [ m + 2 , n ] + R [ m − , n ]+ R [ m, n −
2] + R [ m, n + 2]) . (4)The other missing color values are recovered in the same fash-ion. This demosaicking method, namely HQLI, adds second-order details (the Laplacian) of one channel to the average ofanother channel, thereby it performs image enhancement.To examine whether the inter-channel information supple-ments intra-channel recovery in image warping. We conducta preliminary experiment using image super-resolution as anexample application. We upsample test images by bilinearand bicubic interpolation, and then refine the upsampled im-age values by sequential HQLI demosaicking. In particular,we first apply the GRBG CFA pattern to filter the upsampledcolor image, and estimate the filtered-out color values by theremaining color values. Subsequently, we apply the RGGBCFA to the demosaicked full RGB image, and perform thesecond-round demosaicking. We then change the CFA pat-tern to BGGR, and repeat HQLI demosaicking for the finalround. In this way, each color component of a pixel is re-fined by the intensity values in the local neighbourhood of theother two channels. Table1 compares the reconstruction ac-curacy, measured by Peak Signal to Noise Ratio (PSNR), be-fore and after demosaicking (i.e., the cross-channel informa-tion) on benchmark datasets Set5 [19], Set14 [20], BSD100[21] and Urban100 [22]. It can be seen that the inter-channelcorrelation introduced by demosaicking notably improves theintra-channel interpolation, on all test datasets. Numerically,the improvement is upto . dB and . dB for bilinear andbicubic intra-channel interpolation respectively.Note that to simulate the real world applications, herethe low resolution images are obtained by Nearest Neighbourdownsampling of the high resolution test images. This is dif-ferent from the common SISR algorithm evaluation routine,which utilizes Bicubic downsampling. The Nearest Neigh-bour sampling is known to suffer severe aliasing and blurring,whereas Bicubic downsampling implicitly performs low passfiltering before sampling, thereby largely avoids such artifacts[16]. However, using Bicubic downsampling also implicitlyassumes that the high resolution images are known, which isnot the case for real applications. Therefore we only apply able 1 . The average PSNR before and after refining bilinearand bicubic upsampling by sequential HQLI demosaicking,tested on benchmark datasets.Dataset Bilinear Bilinear+HQLI Bicubic Bicubic+HQLISet5 28.86 30.43 28.64 29.73Set14 26.55 27.50 26.23 26.85BSD100 26.44 26.89 26.02 26.16Urban100 23.57 24.12 23.17 23.45bicubic downsampling in the comparison experiments for fairconditions. Due to the coarse downsampling, bicubic warpinghas lower accuracy than bilinear warping, which is seeminglycounter intuitive.
3. CROSS-CHANNEL COLOR IMAGE WARPING3.1. General Formulation
Based on the analysis of the gain achieved by the inter-channel information, we formulate general color image warp-ing by G [ x, y ] ≈ ˜ G [ x, y ] + ω g,r ∆ R [ x, y ] + ω g,b ∆ B [ x, y ] (5a) R [ x, y ] ≈ ˜ R [ x, y ] + ω r,g ∆ G [ x, y ] + ω r,b ∆ B [ x, y ] (5b) B [ x, y ] ≈ ˜ B [ x, y ] + ω b,g ∆ G [ x, y ] + ω b,r ∆ R [ x, y ] , (5c)where ˜ G [ x, y ] , ˜ R [ x, y ] and ˜ B [ x, y ] can be any intra-channel estimation of G [ x, y ] , R [ x, y ] and B [ x, y ] ; also, ∆ G [ x, y ] , ∆ R [ x, y ] and ∆ B [ x, y ] can be estimated by in-terpolating ∆ G [ k, l ] [ k,l ] ∈ Ω( m,n ) , ∆ R [ k, l ] [ k,l ] ∈ Ω( m,n ) and ∆ B [ k, l ] [ k,l ] ∈ Ω( m,n ) respectively, where Ω[ m, n ] stands forthe local neighbourhood of pixel [ m, n ] . The involved inter-polation schemes can be either linear or non-linear, isotropicor anisotropic, depending on the particular requirements ofthe applications at hand. The weights ω ( · , · ) are to be learnedfrom training images, as described in Section3.2. We illustrate the weight learning process using the SingleImage Super Resolution (SISR) application as an example,for which the geometric mapping H and its inverse mapping H − are H = (cid:20) S S (cid:21) H − = (cid:20) S S (cid:21) , (6)where S in a real number larger than . Note that S is notrestricted to be an integer. We downsample the groundtruth images as the source images I (i.e., the data) by bicubic in-terpolation, following the traditional experimental setting inthe literature of SISR, as the learned weights are to be used foralgorithm evaluation in comparison with state of the art. Weuse the groundtruth images (taken from benchmark trainingdataset BSD200 [21]) as target images I (i.e., the labels). Foreach pixel [ i, j ] in the an image I , the RGB values of its cor-respondence ( x, y ) in I thus are known in the training pro-cess. ˜ G [ x, y ] , ˜ R [ x, y ] and ˜ B [ x, y ] , as well as ∆ G [ x, y ] , ∆ R [ x, y ] and ∆ B [ x, y ] are computed on the pixels in eachsource image by interpolation function f . Subsequently, Eq.5can be rewritten as linear systems of the weights ω ( · , · ) . Forexample, ∆ R [ x , y ] ∆ B [ x , y ] ... ... ∆ R [ x K , y K ] ∆ B [ x K , y K ] (cid:20) ω g,r ω g,b (cid:21) = ˜ G [ x , y ] − G [ i , j ] ... ˜ G [ x K , y K ] − G [ i K , j K ] (7)for Eq.5a. Here integer K = 10000 is the total number of ran-domly selected training pixels, indexed by the subscripts of x , y , i and j . Eq.5b and Eq.5c can be re-organized similarly. TheMean Square Error (MSE) solutions or Mean Absolute Error(MAE) solutions to the linear systems can be easily solved inclosed-form. This yields the weights ω g,r , ω g,b , ω r,g , ω r,b , ω b,r and ω b,g .Seemingly, the learned weight values should be specificto the scaling factor S and interpolation function f . However,we observe that weights learned for large S actually general-izes well to small S . Therefore, we suggest conducting thetraining for a large scaling factor (e.g., S = 4 ), and apply thelearned weights directly to applications whose scaling factoris smaller.Table2 lists the weights we learned from Eq.7 by linearregression for S = 4 , with the interpolation function f beingBilinear, Bicubic and Lanczos (with × footprint). Table 2 . The integration weights learned by linear regressionfor S = 4 and various interpolation functions f .Bilinear Bicubic Lanczos ω g,r ω g,b ω r,g ω r,b ω b,g ω b,r -0.003 0.003 0.008 . EXPERIMENTAL RESULTS We conduct two sets of experiments to evaluate the proposedcolor image warping technique. The first experiment mea-sures the SISR accuracy before and after incorporating thecross-channel terms to intra-channel warping, at various scal-ing factors and using different real time interpolation func-tions. The second experiments compares our technique withthe state of the art SISR method Global Regression (GR) pro-posed by Timofte-De Smet-Van Gool in [23], using the re-leased Matlab source code. The GR method is the fastestSISR method proposed in recent years to the best of ourknowledge, and hence is chosen for comparison in our exper-iment. It should be mentioned that, the weight GR is signifi-cantly higher than our linear isotropic scheme used for testing.For comparison in equal conditions, the low resolutiontest images I are obtained by bicubicly downsampling theground truth image I at sampling factor S . That is, the lowresolution images have (cid:4) MS (cid:5) rows and (cid:4) NS (cid:5) columns. Lapla-cians are computed at each pixel in each channel of the lowresolution images. Both the Laplacian maps and low resolu-tion images are upsampled by the same interpolation function f . In our experiments, S is set to be 2, 3 or 4; and f is set tobe Bilinear, Bicubic or Lanczos (of kernel size × ).Evaluation is carried out on benchmark datasets Set5,Set14, BSD100 and Urban100. Beside the PSNR accuracymeasure, we also report the Structure Similarity (SSIM) mea-sure [24] and running time (in seconds) of the SISR algo-rithms. All experiments are conducted on an Intel Core i7-7700 3.60GHz CPU with 8GB RAM, using Matlab. We compare the performance of popular real time warpingmethods and the proposed method. Bilinear warping proba-bly has the best trade-off between performance and speed inthe literature, as pointed out by Zitov´a-Flusser [8]. Hence itis recommended for video and animation applications, suchas frame registration for motion estimation in OpenCV andtexture mapping in OpenGL. Bicubic warping is another fre-quently adopted option, especially for still image processing.For example, image editing softwares Photoshop and GIMPemploy bicubic interpolation for image resizing and perspec-tive view rectification.Table3 presents the PSNR, SSIM and running time av-eraged over Set5 for various S and f . In this table, the plainintra-channel warping is referred to as “Independent”, and theproposed inter-channel color warping is referred to as “Cor-related”. In all situations, the proposed color warping schemeachieves evidential improvement, either in terms of PSNR orSSIM, in real time computation. The most remarkable quan-titative improvement is obtained for bilinear interpolation, thePSNR values of which are increased by 1.58dB, 0.77dB and0.98dB for 2, 3 and 4 times upscaling respectively. It can be observed that the improvement decreases with the complexityof the interpolation function. However, in the case of Lanczosinterpolation (the most complex function here), the accuracyof our color warping scheme for × upscaling is probablysufficient in many applications, and its computation is merelyin several milliseconds. To the best of our knowledge, among the state of the art SISRalgorithms that have Matlab source code released, GR [23] isthe fastest. We compare the accuracy and speed of our methodwith GR. In this experiment, we fix the interpolation function f to be Lanczos. Table 4 shows the comparison, and Fig.2shows a visual assessment. Relative to GR, the PSNR of ourmethod is generally around . dB lower for various scalingfactors. However, on BSD100 the accuracy performance arevery close. This might be attributed to the integration weightslearned from BSD200, and suggests the potential for higherperformance by using more sophisticated training strategy. Itshould be noted that the proposed warping model has merely6 parameters, and the computation is tens of times faster.
5. CONCLUSION
In this paper, we have proposed a light weight color im-age warping technique, which integrates intra-channel inter-polation with cross-channel details. Extensive experimentson four benchmark datasets have validated that, the pro-posed technique substantially improves the most popular im-age warping algorithms, in very simple form and at triv-ial computational cost. Our algorithm has also been shownto push the performance of traditional Lanczos upsamplingscheme to be comparable with state of the art methods, whilebeing real time.The proposed technique can be readily applied to inter-active multimedia applications that require both fast imagewarping and realistic visual effects, such as Computer Gam-ing and Image Editing. As our integration of cross-channelinformation is of light weight, it is suitable for multimediadevices with limited computation power. It should be notedthat our baseline interpolation, integration formulation andweight training are all very basic. Actually, each step canbe extended for higher performance, if the computation costbudget allows. Moreover, our method can also be used as anhigh quality initialization to many CNN-based methods.
6. REFERENCES [1] Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Ca-ballero, Andrew Cunningham, Alejandro Acosta, An-drew P Aitken, Alykhan Tejani, Johannes Totz, Ze-han Wang, et al., “Photo-realistic single image super-resolution using a generative adversarial network.,” in able 3 . PSNR, SSIM and computation time of plain intra-channel warping and the proposed color warping, averaged overSet5, for various scaling factor S and interpolation function f . f S Independent CorrelatedPSNR SSIM Time PSNR SSIM TimeBilinear × × × × × × × × × Table 4 . Comparison of PSNR and computation time of GR [23] the proposed color warping (with the interpolation functionfixed to be Lanczos), averaged over four benchmark datasets, for various scaling factor S .Dataset S GR OursPSNR Time PSNR TimeSet5 × × × × × × × × × × × × (a) Reference/PSNR (b) GR/25.51 dB (c) Our/25.37 dB Fig. 2 . ’Zebra’ image from Set14 with upscaling *3
IEEE Conference on Computer Vision and PatternRecognition , 2017.[2] Mason Woo, Jackie Neider, Tom Davis, and DaveShreiner,
OpenGL programming guide: the officialguide to learning OpenGL, version 1.2 , Addison-Wesley Longman Publishing Co., Inc., 1999.[3] Leonard McMillan, “Image-based rendering using im- age warping,” in
COM-PUTER GRAPHICS Proceed-ings, Annual Conference Sefies, ACM SIG-GRAP, NewOrleans, Louisiana , 2009, pp. 123–125.[4] Kaiming He, Huiwen Chang, and Jian Sun, “Rectan-gling panoramic images via warping,”
ACM Transac-tions on Graphics (TOG) , vol. 32, no. 4, pp. 79, 2013.[5] Michael Bleyer, Margrit Gelautz, Carsten Rother, andhristoph Rhemann, “A stereo approach that handlesthe matting problem via image warping,” in
ComputerVision and Pattern Recognition, 2009. CVPR 2009.IEEE Conference on , 2009, pp. 501–508.[6] Jean-Yves Bouguet, “Pyramidal implementation of theaffine lucas kanade feature tracker description of the al-gorithm,”
Intel Corporation , vol. 5, no. 1-10, pp. 4,2001.[7] Chris A Glasbey and Kantilal Vardichand Mardia, “Areview of image-warping methods,”
Journal of appliedstatistics , vol. 25, no. 2, pp. 155–171, 1998.[8] Barbara Zitova and Jan Flusser, “Image registrationmethods: a survey,”
Image and vision computing , vol.21, no. 11, pp. 977–1000, 2003.[9] Sebastiano Battiato, Giovanni Gallo, and FilippoStanco, “A locally adaptive zooming algorithm for dig-ital images,”
Image and vision computing , vol. 20, no.11, pp. 805–812, 2002.[10] Xin Li and Michael T Orchard, “New edge-directed in-terpolation,”
IEEE transactions on image processing ,vol. 10, no. 10, pp. 1521–1527, 2001.[11] Hao Jiang and Cecilia Moloney, “A new direction adap-tive scheme for image interpolation,” in
InternationalConference on Image Processing , 2002, vol. 3.[12] Jianchao Yang, John Wright, Thomas S Huang, andYi Ma, “Image super-resolution via sparse representa-tion,”
IEEE transactions on image processing , vol. 19,no. 11, pp. 2861–2873, 2010.[13] Jianchao Yang, Zhaowen Wang, Zhe Lin, Scott Cohen,and Thomas Huang, “Coupled dictionary training forimage super-resolution,”
IEEE transactions on imageprocessing , vol. 21, no. 8, pp. 3467–3478, 2012.[14] Dan Su and Philip Willis, “Image interpolation by pixel-level data-dependent triangulation,”
Computer GraphicsForum , vol. 23, no. 2, pp. 189–201, 2004.[15] Chao Dong, Chen Change Loy, Kaiming He, and Xi-aoou Tang, “Learning a deep convolutional networkfor image super-resolution,” in
European conference oncomputer vision , 2014, pp. 184–199.[16] R. Szeliski,
Computer vision: algorithms and applica-tions , Springer Science & Business Media, 2010.[17] Henrique S Malvar, Li-wei He, and Ross Cutler, “High-quality linear interpolation for demosaicing of bayer-patterned color images,” in
Acoustics, Speech, and Sig-nal Processing, 2004. Proceedings.(ICASSP’04). IEEEInternational Conference on . IEEE, 2004, vol. 3, pp. iii–485. [18] Yan Niu, Jihong Ouyang, Wanli Zuo, and Fuxin Wang,“Low cost edge sensing for high quality demosaicking,”
IEEE Transactions on Image Processing , 2018, EarlyAccess.[19] Marco Bevilacqua, Aline Roumy, Christine Guillemot,and Marie-Line Alberi Morel, “Low-complexity single-image super-resolution based on nonnegative neighborembedding,” in
British Machine Vision Conference ,2012.[20] Roman Zeyde, Michael Elad, and Matan Protter, “Onsingle image scale-up using sparse-representations,” in
International conference on curves and surfaces , 2010,pp. 711–730.[21] David Martin, Charless Fowlkes, Doron Tal, and Jiten-dra Malik, “A database of human segmented natural im-ages and its application to evaluating segmentation al-gorithms and measuring ecological statistics,” in
Inter-national Conference on Computer Vision , 2001, vol. 2,pp. 416–423.[22] Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja,“Single image super-resolution from transformed self-exemplars,” in
IEEE Conference on Computer Visionand Pattern Recognition , 2015, pp. 5197–5206.[23] Radu Timofte, Vincent De Smet, and Luc Van Gool,“Anchored neighborhood regression for fast example-based super-resolution,” in
International Conference onComputer Vision , 2013, pp. 1920–1927.[24] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero PSimoncelli, “Image quality assessment: from error vis-ibility to structural similarity,”