Real-time Surface Deformation Recovery from Stereo Videos
RReal-time Surface Deformation Recoveryfrom Stereo Videos
Haoyin Zhou and Jayender Jagadeesan
Surgical Planning Laboratory,Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA
Abstract.
Tissue deformation during the surgery may significantly de-crease the accuracy of surgical navigation systems. In this paper, wepropose an approach to estimate the deformation of tissue surface fromstereo videos in real-time, which is capable of handling occlusion, smoothsurface and fast deformation. We first use a stereo matching method toextract depth information from stereo video frames and generate thetissue template, and then estimate the deformation of the obtained tem-plate by minimizing ICP, ORB feature matching and as-rigid-as-possible(ARAP) costs. The main novelties are twofold: (1) Due to non-rigid de-formation, feature matching outliers are difficult to be removed by tradi-tional RANSAC methods; therefore we propose a novel 1-point RANSACand reweighting method to preselect matching inliers, which handlessmooth surfaces and fast deformations. (2) We propose a novel ARAPcost function based on dense connections between the control points toachieve better smoothing performance with limited number of iterations.Algorithms are designed and implemented for GPU parallel computing.Experiments on ex- and in vivo data showed that this approach worksat an update rate of 15Hz with an accuracy of less than 2.5 mm on aNVIDIA Titan X GPU.
Keywords:
Tissue deformation recovery · Feature matching outliers · GPU parallel computation.
Tissue visualization during surgery is typically limited to the anatomical surfaceexposed to the surgeon through an optical imaging modality, such as laparo-scope, endoscope or microscope. To identify the critical structures lying belowthe tissue surface, surgical navigation systems need to register the intraopera-tive data to preoperative MR/CT imaging before surgical resection. However,during surgery, tissue deformation caused by heartbeat, respiration and instru-ments interaction may make the initial registration results less accurate. Theability to compensate for tissue deformation is essential for improving the ac-curacy of surgical navigation. In this paper, we propose an approach to recoverthe deformation of tissue surface from stereo optical videos in real-time. a r X i v : . [ c s . C V ] J u l H. Zhou et al.
In recent years, several groups have investigated methods to recover tissuedeformation from optical videos, and most methods are based on the minimiza-tion of non-rigid matching and smoothing costs [1]. For example, Collins etal proposed a monocular vision-based method that first generated the tissuetemplate and then estimated the template deformation by matching the tex-ture and boundaries with a non-rigid iterative closet points (ICP) method [2]. Inthis method, the non-rigid ICP-based boundary matching algorithm significantlyimproves the accuracy. However, during surgery, only a small area of the targettissue may be exposed and the boundaries are often invisible, which makes it dif-ficult to match the template. Object deformation recovery in the computer visionfield is also a suitable approach to recover tissue deformation. For example, Zoll-hfer et al proposed to generate the template from an RGB-D camera and thentrack the deformation by minimizing non-rigid ICP, color and smoothing costs[3]. Newcombe et al have developed a novel deformation recovery method thatdoes not require the initial template and uses sparse control points to representthe deformation [4]. Guo et al used forward and backward L regularization torefine the deformation recovery results [5]. To date, most deformation recoverymethods [6][7] are based on the non-rigid ICP alignment to obtain matching in-formation between the template and the current input, such as monocular/stereovideos or 3D point clouds from RGB-D sensors. However, non-rigid ICP suffersfrom a drawback that it cannot track fast tissue deformation and camera motion,and obtain accurate alignment in the tangential directions on smooth tissue sur-faces. During surgery, the endo/laparoscope may move fast or even temporallyout of the patient for cleaning, which makes non-rigid ICP difficult to track thetissue. In addition, smoke and blood during the surgery may cause significantocclusion and interfere with the tracking process. Hence, the ability to matchthe template and the input video when non-rigid deformation exists is essentialfor intraoperative use of deformation recovery methods.A natural idea to obtain additional information is to match the feature pointsbetween the template and the input video. Among many types of feature descrip-tors, ORB [8] has been widely used in real-time applications due to its efficiency.To handle feature matching outliers, RANSAC-based methods have proven ef-fective in rigid scenarios but are difficult to handle non-rigid deformation [9].Another common method to address outliers is to apply robust kernels to thecost function, which cannot handle fast motion. In this paper, we propose anovel method that combines 1-point-RANSAC and reweighting methods to han-dle matching outliers in non-rigid scenarios. In addition, we propose a novelas-rigid-as-possible (ARAP) [10] method based on dense connections to achievebetter smoothing performance with limited number of iterations. As shown in Fig. 1, we proposed a GPU-based stereo matching method, whichincludes several efficient post-processing steps to extract 3D information from eal-time Surface Deformation Recovery from Stereo Videos 3 stereo videos in real-time. Readers may refer to Ref. [11] for more details on thisstereo matching method. … Stereo Images(left and right) Initial ZNCC matching Outliers Removal& Hole Filling Laplacian Smoothing-based Refinement
Fig. 1.
The process of our stereo matching method with a pair of stereo microscopyimages captured during neurosurgery.
In our system, the initial template of the tissue surface is generated by thestereo matching method, then we track the deformation of the template by repre-senting the non-rigid deformation with sparse control points on the template, andestimating the parameters of the control points to make the deformed templatematch the output of the GPU-based stereo matching method. The algorithmsare parallelized and run on the GPU. Similar to DynamicFusion [4], we employdual-quaternion to represent deformation and each control point i is assigned adual-quaternion W ti to represent its warp function at time t , and the templatepoints are deformed according to the interpolation of neighboring control points.Then, the deformation recovery problem is to estimate W ti , i = 1 , ..., N , and weuse the Levenberg-Marquardt algorithm to minimize the following cost function f Total ( W ti ) = f ICP + w ORB f ORB + w ARAP f ARAP , (1)where f ICP and f ORB are based on non-rigid ICP and ORB matches between thetemplate and the current stereo matching results respectively. The as-rigid-as-possible (ARAP) cost f ARAP smoothes the estimated warp functions W ti , whichis especially important for the estimation of occluded areas. w ORB and w ARAP are user defined weights. In our experiments, we use w ORB = 10 . w ARAP is dynamically adjusted due to the varying number of valid points in f ICP andORB matching inliers in f ORB . We sum up the related weights of ICP and ORBterms for each W ti , and scale up or down w ARAP accordingly.A GPU-based parallel Levenberg-Marquardt (LM) algorithm was developedto minimize the cost (1). We update each W ti independently in the LM iterations.For the computation of the Jacobian matrix J related to each W ti , multiple par-allel GPU threads are launched to compute rows of J , then we perform Choleskydecomposition to update W ti , i = 1 , ..., N .The non-rigid ICP term f ICP is determined by the distances between thedeformed template and the stereo matching results. The Tukey’s penalty function
H. Zhou et al. is employed to handle outliers. We have developed a rasterization process thatre-projects the template points to the imaging plane to build correspondencesbetween template points and the stereo matching results, which is parallelized toeach template point and runs on the GPU. This rasterization step is faster thankd-tree-based closest points search in the 3D space. Only the distance componentin the normal directions are considered, which avoids the problem that non-rigidICP is inaccurate in the tangential directions when aligning smooth surfaces.
As shown in Fig. 2(a)-(b), standard ORB feature detection concentrates on richtexture areas, which may lead to the lack of matching information at low textureareas. Hence, we first develop a method to detect uniform ORB features toimprove the accuracy of deformation recovery, which uses GPU to detect FASTcorners and suppresses those if a neighboring pixel has larger corner responsein parallel. Then, the ORB features of the initial template are matched to thelive video frames. Two corresponding 3D point clouds are obtained, which mayinclude incorrect matches.Since at least three matches are needed to determine the rigid relative posebetween two 3D point clouds, traditional RANSAC methods only work when thethree matches are all inliers and have similar deformation [9]. Another commonmethod to handle outliers is to apply robust kernels to the cost function, whichis effective but cannot handle fast camera motion or tissue deformation. Under areasonable assumption that local deformations at small areas of the tissue surfaceare approximate to rigid transforms, we propose a novel 1-point-RANSAC andreweighting method to pre-select potential matching inliers following the ideaof Ref. [12], as shown in Fig. 2(c). Denoting the two sets of corresponding 3DORB features as o k and o k , k = 1 , ..., N , a random match k is selected as thereference, and rectify the coordinates with respect to k by S lk = (cid:2) o l − o lk , · · · , o lN − o lk (cid:3) × N , l = 1 , . (2)For a reference k , we denote the local rigid transform as o k = R o k + T ,where R ∈ SO (3) is the rotation matrix and T is the translation vector. Rigidtransform for a neighboring match inlier k should satisfy S k ( k ) ≈ RS k ( k ) , (3)where S k ( k ) is the k th column of S k , and R can be obtained from matchesthat satisfy (3). We propose a reweighting method to eliminate the impacts ofother matches, that is d k = (cid:13)(cid:13) S k ( k ) − RS k ( k ) (cid:13)(cid:13) , w k = min ( H/d k , , (4)where d k is the distance related to the k th match. w k is the weight of the k thORB match and if the k th match is either an outlier, or an inlier that does notsatisfy (3), w k is small. H is a predefined threshold. With a selected reference k , eal-time Surface Deformation Recovery from Stereo Videos 5 we alternatively update R from weighted S k and S k , and update w k accordingto (4). In experiments we perform 10 iterations with each k . A small sum of w k suggests that few matches satisfy (3) and k may be an outlier, and we omit theresults with reference k . In our experiments, we randomly select 30 differentmatches as the reference k . (a) (b) (c) (d) Fig. 2. (a)-(b) ORB feature detection results on laparoscopy images captured duringa lung surgery using (a) OpenCV (b) Our method. (c) Matching inliers pre-selectionresults with a deforming phantom. The blue lines are selected inliers and black linesare identified as outliers. (d) Dense connections between control points with a siliconheart phantom.
We first apply this 1-point-RANSAC + reweighting method to assign weightsto ORB matches, the results of which will be used in the subsequent LM algo-rithm to minimize term f ORB in (1). It should be clarified that we are notimplying that this 1-point-RANSAC + reweighting method is able to find allinliers. To take into account all inliers, in the LM algorithm we assign the pre-selected matches the same weight as w k , and assign other ORB matches weightaccording to w k = − / (5 H ) d k + 1, w k ∈ [0 , Traditional ARAP methods are based on sparse connections, such as triangularmeshes. This type of connection is too sparse to propagate the smoothing impactfast enough, and in practice we found that it cannot perform well with thelimited number of iterations in the LM algorithm. Hence, we propose to usedense connections as shown in Fig. 2(d). The weights of connections in traditionalARAP methods are sensitive and need to be specifically designed based on theangles of the triangular mesh [10], hence the ARAP cost function has to beredesigned to handle the dense connections as follows: f ARAP = (cid:88) i ,i w i ,i ( f length ,i i + w angle f angle ,i i + w rotation f rotation ,i i ) (5)where i and i are two control points. w i ,i is the weight of connection between i and i , and a smaller distance between points i and i at time 0 suggestslarger w i ,i . We use w angle = 20 . w rotation = 100 . H. Zhou et al.
For control points i and i , f length ,i i = (cid:0) (cid:107) p ti − p ti (cid:107) − (cid:13)(cid:13) p i − p i (cid:13)(cid:13)(cid:1) f angle ,i i = acos( W ti ( p i ) − p ti , p ti − p ti ) f rotation ,i i = (cid:107) W ti (1 , , , − W ti (1 , , , (cid:107) (6)where p ti is the coordinate of point i at time t . f angle ,i i equals to the anglebetween the normalized vectors W ti ( p i ) − p ti and p ti − p ti , where W ti ( p i )suggest to apply W ti to p i . f rotation ,i i is introduced because W ti has 6-DoFs,which is determined by the differences between the first four components ofdual-quaternion W ti and W ti . Algorithms were implemented with CUDA C++ running on a desktop with IntelXeon 3.0 GHz CPU and NVIDIA Titan X GPU. We first conducted qualitativeexperiments on ex- and in vivo data. As shown in Fig 3(a), we deformed a smoothphantom with lung surface texture and captured 960 ×
540 stereo videos with aKARL STORZ stereo laparoscope. We removed intermediate video frames be-tween the two frames in Fig 3(a) to simulate fast deformation, and our method iscapable of tracking the large deformation. The second experiment was conductedwith ex vivo porcine liver as shown in Fig. 3(b). The deformation was caused byinstrument interaction, and our method is able to handle instrument occlusion.For the in vivo experiments shown in Fig. 3(c)-(e), we used both the Hamlyndata [13] and our data, in which the videos have camera motion and tissue de-formation. We generated the tissue template before instrument interaction andthen track the deformation of the template. The algorithm detected key inlierORB features on the reconstructed surface and tracked these template featuresrobustly in spite of respiratory and pulsatile motions, and instrument occlu-sions. These results highlight the robustness of tracking in spite of physiologicalmotions and varying illumination.We conducted two quantitative experiments. The first experiment was con-ducted on Hamlyn data as shown in Fig. 4(a). The Hamlyn data consists ofstereo video images of a silicon phantom simulating heartbeat motion and corre-sponding ground truth was obtained using CT scan. The template was generatedfrom the first video frame. Results show an RMSE of less than 1mm and theaverage runtime of 32.7ms per frame. In the second experiment, we used the EMtracking system (medSAFE Ascension Technologies Inc.) as the ground truth, asshown in Fig. 4(b)-(d). The porcine liver was placed in an abdominal phantom(The Chamberlain Group) and a medSAFE EM sensor was attached to the liversurface. We deformed the liver manually and recorded the EM sensor measure-ments and compared it with that of the our method. Deformation estimationresults on 420 video frames (Fig. 4(c)-(d)) show a mean error of 1.06mm andstandard deviation of 0.56mm. As shown in Fig. 4(c), the maximum distancebetween the trajectory points is 15.7mm. The average runtime was 63.0ms perframe. eal-time Surface Deformation Recovery from Stereo Videos 7 (a) (b)(d) (e)(c)
Fig. 3.
Qualitative experiments. First row: input video frames. Second row: the de-formed template and the control points (green dots). (a) Phantom. (b)
Ex vivo porcineliver. (c) Hamlyn in vivo data with deformation caused by instrument interaction (d)Hamlyn in vivo data with respiration and heartbeat. (e)
In vivo kidney data withdeformation caused by respiration.
We propose a novel deformation recovery method that integrates the ORB fea-ture, which is able to handle fast motion, smooth surfaces and occlusion. Thelimitation of this work is that it strongly relies on ORB feature matching, whichmay fail when the deformation is extremely large and different light reflectionmay make it difficult to obtain enough number of ORB matching inliers.
This project was supported by the National Institute of Biomedical Imagingand Bioengineering of the National Institutes of Health through Grant Num-bers P41EB015898 and R01EB025964. Unrelated to this publication, Jayender
H. Zhou et al.
RMSE = 0.68 0.74 0.71 0.63 (mm) (b) (d) (c) stereo laparoscope EM tracking systemporcine liver (a)
Fig. 4.
Quantitative experiments. (a) Hamlyn heart Phantom data. First row: coloredmodels are the deformed templates, white points are the ground truth. Second row:distance maps. Average runtime: stereo matching 3.8 ms, ORB feature detection andmatching 10.6 ms, inliers pre-selection 4.1 ms, LM 14.2 ms. (b)-(d) Experiment with theEM tracking system. (b) Hardware. (c) 3D trajectories. (d) Errors. Average runtime:stereo matching 17.6 ms, ORB feature detection and matching 11.6 ms, inliers pre-selection 3.1 ms, LM 30.7 ms.
Jagadeesan owns equity in Navigation Sciences,Inc. He is a co-inventor of a nav-igation device to assist surgeons in tumor excision that is licensed to NavigationSciences. Dr. Jagadeesans interests were reviewed and are managed by BWHand Partners HealthCare in accordance with their conflict of interest policies.