360-degree Video Stitching for Dual-fisheye Lens Cameras Based On Rigid Moving Least Squares
3360-DEGREE VIDEO STITCHING FOR DUAL-FISHEYE LENS CAMERAS BASED ONRIGID MOVING LEAST SQUARES
Tuan Ho (cid:63)
Ioannis D. Schizas (cid:63)
K. R. Rao (cid:63)
Madhukar Budagavi † (cid:63) Dept. of Electrical Engineering, University of Texas–Arlington, Arlington, TX USA † Samsung Research America, Richardson, TX USA
ABSTRACT
Dual-fisheye lens cameras are becoming popular for 360-degree video capture, especially for User-generated content(UGC), since they are affordable and portable. Images gen-erated by the dual-fisheye cameras have limited overlap andhence require non-conventional stitching techniques to pro-duce high-quality 360x180-degree panoramas. This paperintroduces a novel method to align these images using in-terpolation grids based on rigid moving least squares. Fur-thermore, jitter is the critical issue arising when one appliesthe image-based stitching algorithms to video. It stems fromthe unconstrained movement of stitching boundary from oneframe to another. Therefore, we also propose a new algorithmto maintain the temporal coherence of stitching boundary toprovide jitter-free 360-degree videos. Results show that themethod proposed in this paper can produce higher qualitystitched images and videos than prior work.
Index Terms —
1. INTRODUCTION
Dual-fisheye lens cameras are becoming popular for 360-degree video capture, especially for UGC. Their portabilityand affordability give them an edge over traditional andprofessional 360-degree capturing systems such as [1][2]which usually deploy 6–17+ cameras on the same rig and arevery expensive to own. An example dual-fisheye lens 360-degree camera is the Samsung Gear 360 which can produce360x180-degree panoramas and video that are viewable onthe 360-degree viewers such as Cardboard [3] or GearVR [4].Other examples of dual-fisheye lens 360-degree cameras areRicoh Theta [5] and LG 360 Cam [6] to name a few.However, the convenience of a compact and affordable360-degree capture system comes with a caveat. The imagesgenerated by the dual-fisheye lenses have limited overlap, andas we show in [7], the conventional stitching methods such asones in [8] [9] do not provide satisfactory stitching results.To stitch the images generated by the dual-fisheye lenscameras, [7] suggests a framework of four main stages, asshown in Figure 2(a). The first and second stages compensate
Fig. 1 . 360x180-degree panorama stitched by [7] and the dis-continuities in the overlapping regions.for the light-fall off of the fisheye-lens camera and transformthe light-compensated fisheye images into an equirectangu-lar format that can be viewed on 360-degree players respec-tively. After the first two stages, the fisheye-unwarped imagesdoes not align with each other. Therefore, [7] proposes a two-step registration method that minimizes the discontinuity inthe overlapping regions to align the images and blend themtogether. In this approach, the first step compensates for thegeometric misalignment between the two fisheye lenses anddepends on the camera parameters. The second step is a morerefined alignment that adjusts any discontinuities caused byobjects with varying depth in the stitching boundaries. In thefirst alignment, [7] solves an over-determined system for awarping matrix which is then used to align the images. Thisresults in a least-squares approximated solution which glob-ally transforms the images. Our observation is that typicallythe control points in the central part of the 360-degree imageget aligned well resulting in improved quality compared toprior techniques. However, the control points at the top andbottom part of the image do not get aligned precisely leadingto stitching artifacts in those regions. Figure 1 shows an ex-ample of visible discontinuities in the stitching boundary ofpictures with patterns on the background.This paper builds upon our previous work in [7] and im-proves the image alignment and stitching performance overthe entire stitched 360x180-degree panoramas. It uses rigidmoving least squares approach to achieve the improved align-ment. This paper also extends the work to video stitching byincorporating a new temporal-coherent algorithm to producejitter-free 360-degree videos.
Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. a r X i v : . [ c s . MM ] A ug ig. 2 . The processing flow [7] (a) and the approach of this paper (b).
2. THE PROPOSED ALGORITHM
Figure 2(b) shows the block diagram of the proposed algo-rithm in this paper. Similar to [7], the proposed image align-ment also has two steps – the first one is dependent on cameraparameters, and the second step works adaptively to the scene.However, instead of estimating a warping matrix in a least-squares sense to align the pictures in the first step, we gen-erate interpolation grids to deform the image based on rigidmoving least squares (MLS) approach.
Let p and q be the control points in the overlapping regions ofthe original and deformed images respectively. [10] definesthree properties of an image deformation function f whichare: interpolation ( f ( p i ) = q i under deformation), smooth-ness (preserves smooth transition among pixels), and identity( q i = p i ⇒ f ( v ) = v ).For every point in the image, we solve for a transforma-tion matrix M that minimizes the weighted least squares: argmin M (cid:88) i w i (cid:13)(cid:13)(cid:13) ˆ p i M − ˆ q i (cid:13)(cid:13)(cid:13) (1)where the weights w i are proportional to the distance betweenthe image point v and the control point p i in the sense that w i gets smaller when v moves further away from p i (i.e. the leastsquares minimization depends on the point of evaluation, thusthe name moving least squares). When v → p i , f interpolates f ( v ) = q i . [10] defines such weights as: w i = 1 | p i − v | α ˆ p i = v − p ∗ and ˆ q i = q ∗ are derived from each point v in theimage and the weighted centroids p ∗ and q ∗ [10].For control points selection, we adopted the checkerboardexperiment from [7] with our method of picking the corre-spondence points. In this experiment, both fisheye lenses,each has 195-degree field of view, see the same checkerboardson their sides. The images taken by the fisheye lenses are un-warped to 360x180-degree equirectangular planes. We thenarrange the unwarped images so that the right image is posi-tioned at the center of the 360x180-degree plane, while theleft image is split and put to the sides of the plane. With this arrangement, the overlapping regions are ready for control-point selection. By choosing the same checkerboards’ crosssections on the unwarped images, one can visualize the geo-metric misalignment between the two lenses. Figure 3 showsthe selected control points { p } i and { q } i , which indicate thedifferentiated positions of the same points in the stitchingboundaries of the two images. Our interest is to determinethe function f r that does the transformation f r ( p i ) = q i inthe overlapping regions while keeping the other areas of theimage as visually intact as possible.While the MLS is general in the matrix M in (1), we areonly interested in the rigid transformation since it generatesmore realistic results than affine and similarity transforma-tions. The similarity transformations are a subset of the affinetransformations that have only translation, rotation, and uni-form scaling. The similarity matrix M is defined such that M T M = λ I (e.g. a rotation matrix). λ acts as a uniformscaling factor. In rigid transformation, it is desirable that nouniform scaling is included. [10] proposed a theorem that re-lates the MLS solution for M T M = λ I (similarity transfor-mation) to its solution of M T M = I (rigid transformation),and derived the solution for the rigid MLS function f r . Weinvite readers to read [10] for more details about the mathe-matical treatment used here.We generate the rigid-MLS interpolation grids to deformthe right unwarped image (i.e. to apply f r over the image),thus aligning it with the left one. Figure 4 shows the rightunwarped fisheye image gets deformed by the rigid MLSmethod. While the portions of the image in proximity tothe stitching boundaries are transformed to match the otherimage, the remaining of the deformed picture have no dis-cernible difference compared to the original. The rigid MLS aligns the control points around the stitch-ing boundary, thus registering the two unwarped fisheye im-ages together. However, when the depth of the object in theoverlapping areas changes, it introduces misalignment to thescene. Therefore, a refined alignment is necessary after therigid MLS deformation. To this end, we adopt the same adap-tive method of using the normalized cross-correlation match-ing in [7] to further align the images.The refined alignment performs a fast template match-ing and utilizes the matching displacements on both stitching
Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. ig. 3 . Left: the overlapping areas on the unwarped left imageand { q } i (green dots). Right: unwarped right image and { p } i (yellow dots). Fig. 4 . The original and the rigid-MLS-deformed images withtheir control points { p } i and { q } i overlayed.boundaries to generate eight pairs of control points. Thesepoints are then used to solve for a 3x3 affine matrix to warpthe deformed image. As a least-squares solution, this refinedmethod is not sufficient in registering images with compli-cated misalignment patterns, but it works very well for thosewith minor misalignment such as the one caused by the var-ied object’s depth. Figure 5 shows that the refined alignmentminimizes the discontinuity when the person is sitting closeto the camera’s stitching boundary.
3. EXTENSION TO 360-DEGREE VIDEO
In the 360-degree video stitching, it is essential to minimizejitters–the abrupt transition between the stitched frames sothat the final video appears continuous and comfortable toview. Adjacent frames in the sequence that are not stitched bythe same measure can generate jitters. In the work presentedhere, when a bad match occurs without getting filtered out inthe refined alignment, it generates a false warping matrix thatabruptly distorts the stitching boundary of the picture. This at-tenuated scene causes jitter which is the result of the suddentransition between the previous well-stitched frame and thecurrent bad-stitched one. Therefore, it is important to guaran-tee good matches throughout the entire sequence to maintainsmooth frame-to-frame transition, and thus minimize jitters.Algorithm 1 illustrates our method to maintain the tem-
Fig. 5 . The person sitting close to the camera and in the stitch-ing boundary. Left: after rigid MLS deformation. Right: afterrigid MLS deformation and refined alignment.
Algorithm 1:
Refined Alignment (with jitter control)
Input: lef tImage , rightImage (deformed) ( scoreLeft , scoreRight ) ← TemplMatch() ; if ( both matching scores are good ) then Estimate affine warping matrix affineMat ; Store affineMat for the next frame; warpEn ← ; else // bad scores on either boundary if matching scores of the previous frame are good then warpEn ← ; affineMat ← previous affineMat ; else warpEn ← ; // don’t warp image end end if ( warpEn ) then Warp rightImage by affineMat ; end poral coherence for the sequence. A good score is returned atone stitching boundary if all of the followings satisfied. First,the peak normalized cross-correlation is larger than . / . .Second, the returned vertical displacement is in the margin of [ − , +10] pixels. Third, the horizontal displacement of thecurrent match must not exceed margin compared to its ofthe previous frame. These constraints, obtained from our em-pirical experiments, are set to eliminate bad matching causedby poor lighting and abrupt movements of the boundaries inhorizontal and vertical directions.
4. IMPLEMENTATION AND RESULTS
We have implemented the proposal algorithm in C++ andMatlab. The rigid MLS grids are precomputed, and the de-formation becomes an interpolation process that can be ac-celerated by GPU.Figure 6(a) illustrates an image stitched by the proposed
Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. ig. 6 . (a) 360x180-degree panorama stitched by this proposal. (b)(c) The stitching boundaries in (a) (top row) compared to thesame image stitched by [7] (bottom row).
Fig. 7 . Stitching boundary in consecutive frames. Stitched by the proposal (top row), and by [7] (bottom row).method. In this picture, there are a fence, buildings, and pat-terned background on the stitching boundaries. Figure 6(b)shows the comparison of the stitching boundaries in the imagestitched by the proposal and by [7] (also in Figure 1). Whilethe discontinuities appear in the stitching boundaries in [7] asthe result of the least-squares solution, the proposed methodproduces seamless 360-degree panorama thanks to the rigidMLS deformation.For video stitching , Figure 7 demonstrates the adjacentstitched frames created by the proposal (top row, no jitter)and by [7] (bottom row). In the bottom row, the first jitter oc-curs when the refined alignment lets a bad match get through,resulting in an affine transformation that moves the image onthe right side of the stitching boundary to the left. As a result,the car in the boundary gets distorted leading to an abrupttransition between the frames. This paper has supplementary downloadable materials, which are thestitched videos generated by this novel method and by [7].
5. CONCLUSION
This paper has introduced a novel method for stitching theimages and video sequences generated by the dual-fisheyelens cameras. The proposed alignment has two steps. Thefirst one, carried out offline, compensates for the sophisti-cated geometric misalignment between the two fisheye lenseson the camera based on rigid moving least squares approach.The second step, applied online and adaptively to the scene,provides a more refined adjustment for any misalignmentcreated by the objects with varying depth on the stitchingboundaries. We extend the proposed approach to 360-degreevideo stitching with the relevant constraints to maintain thesmooth transition between frames and therefore minimizejitters. Results show that our method not only generates moreaccurately stitched 360x180-degree images but also jitter-free360-degree videos.
Copyright 2017 IEEE. Published in the IEEE 2017 International Conference on Image Processing (ICIP 2017), scheduled for 17-20 September 2017 in Beijing, China. . REFERENCES [1] “GoPro Odyssey,” https://gopro.com/odyssey , [Retrieved August, 2016].[2] “Facebook Surround360,” https://facebook360.fb.com/facebook-surround-360 , [Retrieved August,2016].[3] “Google Cardboard,” https://vr.google.com/cardboard , [Retrieved August, 2016].[4] “Samsung GearVR,” , [Retrieved August, 2016].[5] “Ricoh Theta,” https://theta360.com , [Re-trieved August, 2016].[6] “LG 360 Cam,” , [Retrieved August, 2016].[7] T. Ho and M. Budagavi, “Dual-fisheye lens stitchingfor 360-degree imaging,” in
Proc. of the 42nd IEEE In-ternational Conference on Acoustics, Speech and SignalProcessing (ICASSP’17) , 2017 (Accepted).[8] M. Brown and D.G. Lowe, “Automatic panoramic im-age stitching using invariant features,”
InternationalJournal of Computer Vision , vol. 74, pp. 59–73, August2007.[9] R. Szeliski,
Computer Vision: Algorithms and Applica-tions , Springer, London, UK, 1st edition, 2011.[10] S. Schaefer, T. McPhail, and J. Warren, “Image defor-mation using moving least squares,” in
Proc. of ACMSIGGRAPH ’06 , 2006., 2006.