[PDF] 360-degree Video Stitching for Dual-fisheye Lens Cameras Based On Rigid Moving Least Squares

Abstract

Dual-fisheye lens cameras are becoming popular for 360-degree video capture, especially for User-generated content (UGC), since they are affordable and portable. Images generated by the dual-fisheye cameras have limited overlap and hence require non-conventional stitching techniques to produce high-quality 360x180-degree panoramas. This paper introduces a novel method to align these images using interpolation grids based on rigid moving least squares. Furthermore, jitter is the critical issue arising when one applies the image-based stitching algorithms to video. It stems from the unconstrained movement of stitching boundary from one frame to another. Therefore, we also propose a new algorithm to maintain the temporal coherence of stitching boundary to provide jitter-free 360-degree videos. Results show that the method proposed in this paper can produce higher quality stitched images and videos than prior work.

Full PDF

3360-DEGREE VIDEO STITCHING FOR DUAL-FISHEYE LENS CAMERAS BASED ONRIGID MOVING LEAST SQUARES

Tuan Ho (cid:63)

Ioannis D. Schizas (cid:63)

K. R. Rao (cid:63)

Madhukar Budagavi † (cid:63) Dept. of Electrical Engineering, University of Texas–Arlington, Arlington, TX USA † Samsung Research America, Richardson, TX USA

ABSTRACT

Dual-ﬁsheye lens cameras are becoming popular for 360-degree video capture, especially for User-generated content(UGC), since they are affordable and portable. Images gen-erated by the dual-ﬁsheye cameras have limited overlap andhence require non-conventional stitching techniques to pro-duce high-quality 360x180-degree panoramas. This paperintroduces a novel method to align these images using in-terpolation grids based on rigid moving least squares. Fur-thermore, jitter is the critical issue arising when one appliesthe image-based stitching algorithms to video. It stems fromthe unconstrained movement of stitching boundary from oneframe to another. Therefore, we also propose a new algorithmto maintain the temporal coherence of stitching boundary toprovide jitter-free 360-degree videos. Results show that themethod proposed in this paper can produce higher qualitystitched images and videos than prior work.

Index Terms —

1. INTRODUCTION

Dual-ﬁsheye lens cameras are becoming popular for 360-degree video capture, especially for UGC. Their portabilityand affordability give them an edge over traditional andprofessional 360-degree capturing systems such as [1][2]which usually deploy 6–17+ cameras on the same rig and arevery expensive to own. An example dual-ﬁsheye lens 360-degree camera is the Samsung Gear 360 which can produce360x180-degree panoramas and video that are viewable onthe 360-degree viewers such as Cardboard [3] or GearVR [4].Other examples of dual-ﬁsheye lens 360-degree cameras areRicoh Theta [5] and LG 360 Cam [6] to name a few.However, the convenience of a compact and affordable360-degree capture system comes with a caveat. The imagesgenerated by the dual-ﬁsheye lenses have limited overlap, andas we show in [7], the conventional stitching methods such asones in [8] [9] do not provide satisfactory stitching results.To stitch the images generated by the dual-ﬁsheye lenscameras, [7] suggests a framework of four main stages, asshown in Figure 2(a). The ﬁrst and second stages compensate

Fig. 1 . 360x180-degree panorama stitched by [7] and the dis-continuities in the overlapping regions.for the light-fall off of the ﬁsheye-lens camera and transformthe light-compensated ﬁsheye images into an equirectangu-lar format that can be viewed on 360-degree players respec-tively. After the ﬁrst two stages, the ﬁsheye-unwarped imagesdoes not align with each other. Therefore, [7] proposes a two-step registration method that minimizes the discontinuity inthe overlapping regions to align the images and blend themtogether. In this approach, the ﬁrst step compensates for thegeometric misalignment between the two ﬁsheye lenses anddepends on the camera parameters. The second step is a morereﬁned alignment that adjusts any discontinuities caused byobjects with varying depth in the stitching boundaries. In theﬁrst alignment, [7] solves an over-determined system for awarping matrix which is then used to align the images. Thisresults in a least-squares approximated solution which glob-ally transforms the images. Our observation is that typicallythe control points in the central part of the 360-degree imageget aligned well resulting in improved quality compared toprior techniques. However, the control points at the top andbottom part of the image do not get aligned precisely leadingto stitching artifacts in those regions. Figure 1 shows an ex-ample of visible discontinuities in the stitching boundary ofpictures with patterns on the background.This paper builds upon our previous work in [7] and im-proves the image alignment and stitching performance overthe entire stitched 360x180-degree panoramas. It uses rigidmoving least squares approach to achieve the improved align-ment. This paper also extends the work to video stitching byincorporating a new temporal-coherent algorithm to producejitter-free 360-degree videos.

Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. a r X i v : . [ c s . MM ] A ug ig. 2 . The processing ﬂow [7] (a) and the approach of this paper (b).

2. THE PROPOSED ALGORITHM

Figure 2(b) shows the block diagram of the proposed algo-rithm in this paper. Similar to [7], the proposed image align-ment also has two steps – the ﬁrst one is dependent on cameraparameters, and the second step works adaptively to the scene.However, instead of estimating a warping matrix in a least-squares sense to align the pictures in the ﬁrst step, we gen-erate interpolation grids to deform the image based on rigidmoving least squares (MLS) approach.

Let p and q be the control points in the overlapping regions ofthe original and deformed images respectively. [10] deﬁnesthree properties of an image deformation function f whichare: interpolation ( f ( p i ) = q i under deformation), smooth-ness (preserves smooth transition among pixels), and identity( q i = p i ⇒ f ( v ) = v ).For every point in the image, we solve for a transforma-tion matrix M that minimizes the weighted least squares: argmin M (cid:88) i w i (cid:13)(cid:13)(cid:13) ˆ p i M − ˆ q i (cid:13)(cid:13)(cid:13) (1)where the weights w i are proportional to the distance betweenthe image point v and the control point p i in the sense that w i gets smaller when v moves further away from p i (i.e. the leastsquares minimization depends on the point of evaluation, thusthe name moving least squares). When v → p i , f interpolates f ( v ) = q i . [10] deﬁnes such weights as: w i = 1 | p i − v | α ˆ p i = v − p ∗ and ˆ q i = q ∗ are derived from each point v in theimage and the weighted centroids p ∗ and q ∗ [10].For control points selection, we adopted the checkerboardexperiment from [7] with our method of picking the corre-spondence points. In this experiment, both ﬁsheye lenses,each has 195-degree ﬁeld of view, see the same checkerboardson their sides. The images taken by the ﬁsheye lenses are un-warped to 360x180-degree equirectangular planes. We thenarrange the unwarped images so that the right image is posi-tioned at the center of the 360x180-degree plane, while theleft image is split and put to the sides of the plane. With this arrangement, the overlapping regions are ready for control-point selection. By choosing the same checkerboards’ crosssections on the unwarped images, one can visualize the geo-metric misalignment between the two lenses. Figure 3 showsthe selected control points { p } i and { q } i , which indicate thedifferentiated positions of the same points in the stitchingboundaries of the two images. Our interest is to determinethe function f r that does the transformation f r ( p i ) = q i inthe overlapping regions while keeping the other areas of theimage as visually intact as possible.While the MLS is general in the matrix M in (1), we areonly interested in the rigid transformation since it generatesmore realistic results than afﬁne and similarity transforma-tions. The similarity transformations are a subset of the afﬁnetransformations that have only translation, rotation, and uni-form scaling. The similarity matrix M is deﬁned such that M T M = λ I (e.g. a rotation matrix). λ acts as a uniformscaling factor. In rigid transformation, it is desirable that nouniform scaling is included. [10] proposed a theorem that re-lates the MLS solution for M T M = λ I (similarity transfor-mation) to its solution of M T M = I (rigid transformation),and derived the solution for the rigid MLS function f r . Weinvite readers to read [10] for more details about the mathe-matical treatment used here.We generate the rigid-MLS interpolation grids to deformthe right unwarped image (i.e. to apply f r over the image),thus aligning it with the left one. Figure 4 shows the rightunwarped ﬁsheye image gets deformed by the rigid MLSmethod. While the portions of the image in proximity tothe stitching boundaries are transformed to match the otherimage, the remaining of the deformed picture have no dis-cernible difference compared to the original. The rigid MLS aligns the control points around the stitch-ing boundary, thus registering the two unwarped ﬁsheye im-ages together. However, when the depth of the object in theoverlapping areas changes, it introduces misalignment to thescene. Therefore, a reﬁned alignment is necessary after therigid MLS deformation. To this end, we adopt the same adap-tive method of using the normalized cross-correlation match-ing in [7] to further align the images.The reﬁned alignment performs a fast template match-ing and utilizes the matching displacements on both stitching

Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. ig. 3 . Left: the overlapping areas on the unwarped left imageand { q } i (green dots). Right: unwarped right image and { p } i (yellow dots). Fig. 4 . The original and the rigid-MLS-deformed images withtheir control points { p } i and { q } i overlayed.boundaries to generate eight pairs of control points. Thesepoints are then used to solve for a 3x3 afﬁne matrix to warpthe deformed image. As a least-squares solution, this reﬁnedmethod is not sufﬁcient in registering images with compli-cated misalignment patterns, but it works very well for thosewith minor misalignment such as the one caused by the var-ied object’s depth. Figure 5 shows that the reﬁned alignmentminimizes the discontinuity when the person is sitting closeto the camera’s stitching boundary.

3. EXTENSION TO 360-DEGREE VIDEO

In the 360-degree video stitching, it is essential to minimizejitters–the abrupt transition between the stitched frames sothat the ﬁnal video appears continuous and comfortable toview. Adjacent frames in the sequence that are not stitched bythe same measure can generate jitters. In the work presentedhere, when a bad match occurs without getting ﬁltered out inthe reﬁned alignment, it generates a false warping matrix thatabruptly distorts the stitching boundary of the picture. This at-tenuated scene causes jitter which is the result of the suddentransition between the previous well-stitched frame and thecurrent bad-stitched one. Therefore, it is important to guaran-tee good matches throughout the entire sequence to maintainsmooth frame-to-frame transition, and thus minimize jitters.Algorithm 1 illustrates our method to maintain the tem-

Fig. 5 . The person sitting close to the camera and in the stitch-ing boundary. Left: after rigid MLS deformation. Right: afterrigid MLS deformation and reﬁned alignment.

Algorithm 1:

Reﬁned Alignment (with jitter control)

Input: lef tImage , rightImage (deformed) ( scoreLeft , scoreRight ) ← TemplMatch() ; if ( both matching scores are good ) then Estimate afﬁne warping matrix afﬁneMat ; Store afﬁneMat for the next frame; warpEn ← ; else // bad scores on either boundary if matching scores of the previous frame are good then warpEn ← ; afﬁneMat ← previous afﬁneMat ; else warpEn ← ; // don’t warp image end end if ( warpEn ) then Warp rightImage by afﬁneMat ; end poral coherence for the sequence. A good score is returned atone stitching boundary if all of the followings satisﬁed. First,the peak normalized cross-correlation is larger than . / . .Second, the returned vertical displacement is in the margin of [ − , +10] pixels. Third, the horizontal displacement of thecurrent match must not exceed margin compared to its ofthe previous frame. These constraints, obtained from our em-pirical experiments, are set to eliminate bad matching causedby poor lighting and abrupt movements of the boundaries inhorizontal and vertical directions.

4. IMPLEMENTATION AND RESULTS

We have implemented the proposal algorithm in C++ andMatlab. The rigid MLS grids are precomputed, and the de-formation becomes an interpolation process that can be ac-celerated by GPU.Figure 6(a) illustrates an image stitched by the proposed

Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. ig. 6 . (a) 360x180-degree panorama stitched by this proposal. (b)(c) The stitching boundaries in (a) (top row) compared to thesame image stitched by [7] (bottom row).

Fig. 7 . Stitching boundary in consecutive frames. Stitched by the proposal (top row), and by [7] (bottom row).method. In this picture, there are a fence, buildings, and pat-terned background on the stitching boundaries. Figure 6(b)shows the comparison of the stitching boundaries in the imagestitched by the proposal and by [7] (also in Figure 1). Whilethe discontinuities appear in the stitching boundaries in [7] asthe result of the least-squares solution, the proposed methodproduces seamless 360-degree panorama thanks to the rigidMLS deformation.For video stitching , Figure 7 demonstrates the adjacentstitched frames created by the proposal (top row, no jitter)and by [7] (bottom row). In the bottom row, the ﬁrst jitter oc-curs when the reﬁned alignment lets a bad match get through,resulting in an afﬁne transformation that moves the image onthe right side of the stitching boundary to the left. As a result,the car in the boundary gets distorted leading to an abrupttransition between the frames. This paper has supplementary downloadable materials, which are thestitched videos generated by this novel method and by [7].

5. CONCLUSION

This paper has introduced a novel method for stitching theimages and video sequences generated by the dual-ﬁsheyelens cameras. The proposed alignment has two steps. Theﬁrst one, carried out ofﬂine, compensates for the sophisti-cated geometric misalignment between the two ﬁsheye lenseson the camera based on rigid moving least squares approach.The second step, applied online and adaptively to the scene,provides a more reﬁned adjustment for any misalignmentcreated by the objects with varying depth on the stitchingboundaries. We extend the proposed approach to 360-degreevideo stitching with the relevant constraints to maintain thesmooth transition between frames and therefore minimizejitters. Results show that our method not only generates moreaccurately stitched 360x180-degree images but also jitter-free360-degree videos.

Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. . REFERENCES [1] “GoPro Odyssey,” https://gopro.com/odyssey , [Retrieved August, 2016].[2] “Facebook Surround360,” https://facebook360.fb.com/facebook-surround-360 , [Retrieved August,2016].[3] “Google Cardboard,” https://vr.google.com/cardboard , [Retrieved August, 2016].[4] “Samsung GearVR,” , [Retrieved August, 2016].[5] “Ricoh Theta,” https://theta360.com , [Re-trieved August, 2016].[6] “LG 360 Cam,” , [Retrieved August, 2016].[7] T. Ho and M. Budagavi, “Dual-ﬁsheye lens stitchingfor 360-degree imaging,” in

Proc. of the 42nd IEEE In-ternational Conference on Acoustics, Speech and SignalProcessing (ICASSP’17) , 2017 (Accepted).[8] M. Brown and D.G. Lowe, “Automatic panoramic im-age stitching using invariant features,”

InternationalJournal of Computer Vision , vol. 74, pp. 59–73, August2007.[9] R. Szeliski,

Computer Vision: Algorithms and Applica-tions , Springer, London, UK, 1st edition, 2011.[10] S. Schaefer, T. McPhail, and J. Warren, “Image defor-mation using moving least squares,” in

Proc. of ACMSIGGRAPH ’06 , 2006., 2006.