[PDF] Performance Evaluation of 3D Correspondence Grouping Algorithms

Abstract

This paper presents a thorough evaluation of several widely-used 3D correspondence grouping algorithms, motived by their significance in vision tasks relying on correct feature correspondences. A good correspondence grouping algorithm is desired to retrieve as many as inliers from initial feature matches, giving a rise in both precision and recall. Towards this rule, we deploy the experiments on three benchmarks respectively addressing shape retrieval, 3D object recognition and point cloud registration scenarios. The variety in application context brings a rich category of nuisances including noise, varying point densities, clutter, occlusion and partial overlaps. It also results to different ratios of inliers and correspondence distributions for comprehensive evaluation. Based on the quantitative outcomes, we give a summarization of the merits/demerits of the evaluated algorithms from both performance and efficiency perspectives.

Full PDF

PPerformance Evaluation of 3D Correspondence Grouping Algorithms

Jiaqi Yang, Ke Xian, Yang Xiao and Zhiguo CaoSchool of Automation, Huazhong University of Science and TechnologyWuhan, P. R. China { jqyang, kexian, Yang Xiao, zgcao } @hust.edu.cn Abstract

This paper presents a thorough evaluation of severalwidely-used 3D correspondence grouping algorithms, mo-tived by their signiﬁcance in vision tasks relying on correctfeature correspondences. A good correspondence group-ing algorithm is desired to retrieve as many as inliers frominitial feature matches, giving a rise in both precision andrecall. Towards this rule, we deploy the experiments onthree benchmarks respectively addressing shape retrieval,3D object recognition and point cloud registration scenar-ios. The variety in application context brings a rich cate-gory of nuisances including noise, varying point densities,clutter, occlusion and partial overlaps. It also results to dif-ferent ratios of inliers and correspondence distributions forcomprehensive evaluation. Based on the quantitative out-comes, we give a summarization of the merits/demerits ofthe evaluated algorithms from both performance and efﬁ-ciency perspectives.

1. Introduction

Establishing correct matching relationship between 3Dshapes, also known as correspondence problem, is a corner-stone in 3D computer vision. One critical reason is the pop-ularity of local shape feature-based matching paradigm inapplications such as 3D object recognition [13], point cloudregistration [25], shape retrieval [3] and 3D object catego-rization [28]. Local feature-based matching (Fig. 1) startsfrom detecting a set of distinctive keypoints on the surfaceand representing the local shape geometry with feature de-scriptors, and subsequently generates raw initial matchesfor recognizing the similarities between two shapes. How-ever, one must expect a high amount of false matches dueto two main reasons. One is the residual errors loaded fromthe former modules, e.g., keypoint localization errors andmismatches of feature descriptors in repetitive structures.The other concerns about nuisances including noise, vary-ing point densities, clutter, occlusion and partial overlaps.To ensure the accuracy of the subsequent transformation es-

Source

Shape

Target

Shape

3D Keypoint

Detection

3D Keypoint

Detection

Feature

Description

Feature

Description

Feature Matching Correspondence Grouping Trans. /Hyp. Estimation

Initial Feature Matches Grouped Inliers

Figure 1. Illustration of local feature-based matching paradigm,where the objective of correspondence grouping is searching in-liers from the initial matches between two shapes. timation or hypothesis generation, inliers are desired to beﬁltered from the raw feature matches, highlighting the im-portance of correspondence grouping.A pleasurable correspondence grouping algorithm isamenable to ﬁnd as many as inliers from the initial fea-ture matches, giving an increase in both precision and re-call [5]. Similar to the trend in 2D image domain [7, 10, 8],a notable scientiﬁc passion has recently characterized theﬁeld of 3D correspondence grouping driven by its relatedhigher-level vision tasks such as 3D object recognition and3D reconstruction. In addition to the re-exploration of pop-ular 2D correspondence grouping techniques such as simi-larity score (SS) [20, 36], nearest neighbor similarity ratio(NNSR) [17], random sample consensus (RANSAC) [11]and spectral technique (ST) [16] in 3D domain, we can alsoﬁnd many lately 3D-targeted algorithms such as geometricconsistency (GC) [6], clustering [19], game-theory [24], 3DHough voting (3DHV) [31], and search of inliers (SI) [5].With the wealth of a wide range 3D correspondence group-ing algorithms, yet, the effectiveness of these algorithms areusually assessed on datasets of a particular application withlimited number of nuisances and comparisons. It is there-fore difﬁcult for the developers to choose a proper algorithmgiven a speciﬁc application.4321 a r X i v : . [ c s . C V ] A p r o this end, we present a comprehensive evaluation ofseven state-of-the-art 3D correspondence grouping algo-rithms, i.e., SS, NNSR, RANSAC, ST, GC, 3DHV and SI.This is the ﬁrst comprehensive evaluation study of 3D cor-respondence grouping algorithms, to the best of our knowl-edge, which considers both classical and latest methodswith assessment on benchmarks addressing a variety of ap-plications and nuisances. The terms precision and recall are used to measure the quantitative performance, ensur-ing a balanced examination on the accuracy of the groupedcorrespondences and the amount of inliers retrieved fromthe raw feature matches. Also, we take application con-text into consideration. To be speciﬁc, different applica-tions would result to various ratios of inliers and spatiallocations of the initial feature matches, mainly due to dif-ferent categories and degrees of nuisances. To cover theseconcerns, we deploy our experiments respectively on theBologna 3D retrieval (B3R) [33] , UWA 3D object recogni-tion (U3OR) [20, 19] and UWA 3D modeling (U3M) [21]datasets to examine these 3D correspondence grouping al-gorithms. The B3R dataset tests the robustness of the eval-uated algorithms with respect to noise and varying pointdensities, the U3OR dataset concerns clutter and occlusion,and the U3M dataset contains partially overlapped data. Allthese nuisances have been quantized for a detailed compari-son. In a nut shell, the contributions of this paper are mainlytwofold: • We give a review and a quantitative evaluation ofseven state-of-the-art 3D correspondence grouping al-gorithms on three benchmarks with various nuisancesincluding noise, point density variation, clutter, occlu-sion and partial overlaps. The time efﬁciency regard-ing different amounts of initial matches is also tested. • Instructive summarizations including the traits, advan-tages and limitations of different algorithms are pre-sented.The paper is organized as follows. Sect. 2 gives a re-view of seven state-of-the-art algorithms by identifying thecore computational steps of each proposal. Sect. 3 showsthe evaluation methodology, which consists of the datasets,the performance measures and the implementation detailsof the evaluated algorithms. The experimental results are re-ported in Sect. 4 , while the conclusions are drew in Sect. 5.

2. 3D Correspondence Grouping Algorithms

This section brieﬂy reviews several state-of-the-art 3Dcorrespondence grouping algorithms. The correspondencegrouping problem can be formulate as: given a sourceshape S and a target shape S (cid:48) , where an initial correspon-dence set C is generated after matching the feature sets F and F (cid:48) respectively extracted on S and S (cid:48) , the aim is to ﬁnd a consistent subset of C that identiﬁes the correctmatching relationship between S and S (cid:48) , namely the inlierset C inlier . A component in C can be parametrized by: c = { p, p (cid:48) , s F ( f, f (cid:48) ) } , with p ∈ S , p (cid:48) ∈ S (cid:48) , f ∈ F , f (cid:48) ∈ F (cid:48) , and s F ( f, f (cid:48) ) being the feature similarity scoreassigned to c . With these notations, we describe the keyideas and computation steps of each algorithm in thefollowing. Similarity Score.

Splitting the initial correspondenceset based on the similarity score s F ( f, f (cid:48) ) is a straightfor-ward solution [20, 36]. It is based on the assumption thatcorrespondences with relatively high precisions possesshigher possibility of being correct. Although a number ofdistinctive 3D local features [32, 13] have been proposed,other disturbances such as noise, missing regions andrepetitive patterns could easily cause false judges. Thisalgorithm is served as a baseline in our evaluation, whichjudges a correspondence as correct if: − (cid:107) f − f (cid:48) (cid:107) L ≥ t ss . (1)The popular L distance is used to calculate s F ( f, f (cid:48) ) inthis paper. Nearest Neighbor Similarity Ratio.

Another base-line algorithm evaluated in this paper is Lowe’s ratiorule [17]. It penalizes correspondences by the ratio of thenearest and the second-nearest distance in feature space.It enables distinctive regions export high ranking scores.Similar to SS’s thresholding strategy, NNSR algorithmaccepts a correspondence as inlier if: − (cid:13)(cid:13)(cid:13) f − f (cid:48) (cid:13)(cid:13)(cid:13) L (cid:13)(cid:13) f − f (cid:48) (cid:13)(cid:13) L ≥ t nnsr , (2)with t nnsr ∈ [0 , . Random Sample Consensus.

RANSAC [11] is a it-erative method which judges the correctness of currentsamples through the returned number of inliers, and isbroadly adopted in both 2D [4] and 3D domains [26].Despite its variants [34, 9], we focus on the prototypewhose main steps are as follows.Given N ransac iterations, at each iteration, the algo-rithm ﬁrst randomly samples three components from C .Second, the sampled correspondences are used to computea transformation T i . To judge the correctness of T i , allsource keypoints in C (i.e., the points shared by S and C )would be transformed using T i . The conﬁdence of T i ispositively correlated to the number of transformed sourcekeypoints whose Euclidean distances to their correspondingpoints in S (cid:48) are smaller than a threshold d ransac . Finally,the transformation yielding to the maximum inlier counts computed as the optimal T ∗ , and correspondences in C agreeing with T ∗ are grouped as inliers. Spectral Technique.

Spectral methods are commonly usedfor searching the main cluster of a graph [29, 18]. Basedon the observation that inliers in C should form a consistentcluster, Leordeanu and Hebert [16] used a spectral tech-nique (ST) to group correspondences. The basic idea is toﬁnd the level of association of each correspondence withthe main cluster exits in the initial correspondence set C . Indetail, the algorithm operates as follows.First, a non-negative matrix M comprising all pairwiseterms between correspondences in C is built. Second, theprinciple eigenvector of M is calculated as v , and the loca-tion of the maximum value of v , e.g., v i , indicates c i beinginlier. Third, remove from C all potential components inconﬂict with c i . By repeating step 2 and step3 until v i = 0 or C is empty, the selected candidates from step 2 thus con-sist the ﬁnal inlier set.ST is generative for both 2D and 3D correspondenceproblems, depending on how the pairwise term being de-ﬁned. Here, we use the popular rigidity constrain [15, 5]in 3D domain as the pairwise term of c and c , which isdeﬁned as: r ( c , c ) = min( (cid:107) p , p (cid:107) L (cid:107) p (cid:48) , p (cid:48) (cid:107) L , (cid:107) p (cid:48) , p (cid:48) (cid:107) L (cid:107) p , p (cid:107) L ) . (3)Through thresholding on r ( c , c ) using t st , one can judgewhether c and c are compatible or not. Geometric Consistency.

The GC algorithm [6] is in-dependent from the feature space and applies constrainsrelating to the compatibility of spatial locations of corre-sponding points. The compatibility score for two givencorrespondences c and c is given as: d ( c , c ) = | d ( p , p ) − d ( p (cid:48) , p (cid:48) ) | < t gc , (4)with d ( p , p ) = (cid:107) p − p (cid:107) L , and t gc being a threshold tojudge if c and c satisfy the geometric constrain or not.With above rule, the algorithm then associates a con-sistent cluster to each correspondence. Particularly, givena correspondence c , its compatibility scores with all othercorrespondences in C are computed using Eq. 4. Allthe correspondences with accepted compatibility scorestherefore form a cluster for c , and the size of the clusterdecides the conﬁdence of current cluster of being the inliercluster. By repeating the procedure for all correspondences,the biggest cluster then outputs as the ﬁnal grouped inlierset.

3D Hough Voting.

The Hough Transform [35] is apopular computer vision technique originally proposed to detect lines in images. Tombari and Stefano [31] intro-duced a 3D extension named 3D Hough voting (3DHV) forobject recognition. This method is also employed to groupcorrespondences for partial shape matching [23]. In 3DHV,each correspondence casts a vote in 3D Hough space basedon the following steps.For the i th correspondence in C denoted by c i = { p i , p (cid:48) i } ,the vector between p i ∈ R and the centroid C S ∈ R ofthe source shape S is ﬁrstly computed as: V S i,G = C S − p i , (5)which is then transformed in the coordinates given by thelocal reference frame (LRF) of p i as: V S i,L = R S i · V S i,G , (6)where R S i is the rotation matrix and each line in R S i is a unitvector of the LRF of p i . Note that LRF is an independentcoordinate system established in the local surface aroundthe keypoint, and many 3D feature descriptors [32, 13] pro-vide LRF for feature representation. This step endows thevector of p i with invariance to rigid transformation. Analo-gously, we can obtain a vector V S (cid:48) i,L for p (cid:48) i . If p i and p (cid:48) i arecorrectly corresponded, V S (cid:48) i,L should coincide with V S i,L .Based on this assumption, the vector V S (cid:48) i,L is ﬁnally trans-formed in the global coordinate of S (cid:48) as: V S (cid:48) i,G = R S (cid:48) i · V S (cid:48) i,L + p (cid:48) i . (7)With these transformations, the feature f (cid:48) i could vote in a3D Hough space by means of a vector V S (cid:48) i,G . The peak inthe Hough space indicates the cluster constituted by inliers. Search of Inliers.

The search of inliers (SI) [5] al-gorithm is a recent proposal targeting at solving 3Dcorrespondence problem. The core idea is a combination oflocal and global constrains to determine if a vote should becast. We summarize this algorithm into three main steps,i.e, initialization, local voting and global voting.During initialization, a subset of the initial correspon-dence set C is extracted using the Lowe’s ratio rule (c.f.Eq. 2) as C Ratio . At the local voting stage, the shared cor-respondences of C Ratio and the nearest κ correspondenceneighbors of c are deﬁned as the local voters for c , denotedby C L ( c ) . The components in C L ( c ) that satisfy the rigidityconstrain (c.f. Eq. 3) are deﬁned as the positive local votes Υ L ( c ) : Υ L ( c ) = { c L ∈ C L ( c ) : r ( c, c L ) > ς } , (8)where ς is a free parameter in local voting stage. The localscore of c is then deﬁned as s L ( c ) = | Υ L ( c ) ||C L ( c ) | .At the global voting stage, the global voters C G are se-lected as the former κ correspondences ranked using the a) (b) (c) Figure 2. Sample views (visualized in mesh) from (a) B3R, (b)U3OR and (c) U3M datasets.

Lowe’s ratio score in a decreasing order. To judge the afﬁn-ity between two correspondences c and c , the followingtest is prepared: v G ( c , c ) = d ( T ( c ) · p , p (cid:48) ) , (9)with T ( c ) deﬁned as R ( p (cid:48) ) − · R ( p ) , where R ( p ) rep-resents the LRF of p . The global votes are then found byapplying both local and global constrain as: Υ G ( c ) = { c G ∈ C G : r ( c, c G ) > ς ∧ v G ( c, c G ) < δ } , (10)where δ is a Euclidean distance tolerance. The eventual votescore for c is deﬁned as: s ( c ) = | Υ L ( c ) | + | Υ G ( c ) ||C L ( c ) | + |C G ( c ) | . (11)By thresholding on s ( c ) based on Otsu’s adaptivemethod [22], the correspondences left remain SI-judged in-liers.

3. Evaluation Methodology

All the algorithms presented in Sect. 2 have been evalu-ated on three chosen benchmarks, where different levels ofnoise, point density variation, clutter, occlusion and partialoverlap are contained. The calculated inliers by all algo-rithms are measured using the precision and recall crite-ria. This section also presents the implementation details ofeach algorithm.

B3R Dataset.

The Bologna 3D Retrieval (B3R)dataset [33], with 8 models and 18 scenes, is consid-ered to test the algorithms’ robustness with respect tovarious levels of noise and varying point densities. Themodels are taken from the Stanford Repository , while thescenes are the rotated copies of these models. In speciﬁc,we inject the scenes with Gaussian noise along the x , y and z axes. The standard deviation of noised increases from0.05 to 0.45 pr with in incremental step of 0.05 pr , where pr hereinafter denotes the point cloud resolution, i.e., theaverage shortest distance among neighboring points in the point cloud. Further, we down-sample the scenes from 0.9to 0.1 data resolution with an interval of 0.1 data resolution.The enriched B3R dataset then incorporates 324 sceneswith quantized levels of noise and point density variation. U3OR Dataset.

The UWA 3D object recognition(U3OR) dataset [20, 19] is a popular benchmark for 3Dobject recognition [13, 12], where 5 models and 50 scenesare included. Each scene contains four or ﬁve models in thepresence of approximately 65%-95% degrees of clutter and60%-90% occlusion. A total of 188 valid matching instancepairs can be found in this dataset, whose objective is totest the correspondence grouping algorithms’ resilience toclutter and occlusion.

U3M Dataset.

The UWA 3D modeling (U3M) [21]dataset belongs to the point cloud (2.5D view) registra-tion scenario. There are 22, 16, 16 and 21 2.5D viewsrespectively captured from the

Chef , Chicken , T-rex , and

Parasaurolophus models. We obtain the ground truthtransformations of each considered data pair via ﬁrst man-ually alignment and then iterative closest points (ICP) [2]reﬁnement. We ﬁnally screen out 340 valid pairs from thisdataset with 30%-90% degrees of overlap.

Let T GT = { R GT , t GT } denote the ground truth trans-formation between S and S (cid:48) , where T GT ∈ SE (3) , R GT ∈ SO (3) and t GT ∈ R . A correspondence c =( p, p (cid:48) ) is accepted as correct only if: (cid:107) p · R GT + t GT − p (cid:48) (cid:107) L ≤ (cid:15), (12)where (cid:15) is a judging threshold. Let C inlier , C corretinlier and C GTinlier respectively represent the grouped inlier set, the cor-rect judged inliers in the grouped set, and the ground truthinlier set in the initial correspondence set C , we measure thequality of an algorithm using precision and recall deﬁnedas: Precision = |C corretinlier ||C inlier | , (13) Recall = |C corretinlier ||C GTinlier | , (14)where | · | denotes the cardinality of a set. The input for the algorithms evaluated in this paper, i.e.,the initial correspondence set C , is generated via Harris3D [30] keypoint detection, SHOT [32] feature descriptionand L distance-based feature matching [14, 36]. In defaultsetting, we set the Non-Maxima-Suppression radius of Har-ris 3D detector as 3 pr , generating around 1000 keypointsfor a point cloud containing a hundred thousand of points. able 1. Parameters used through the evaluation.SS t ss Adaptive [22]NNSR t nnsr N ransac d ransac pr ST t st t gc pr κ ς δ pr The support radius of SHOT is 15 pr as suggested in [13],while the judging threshold (cid:15) equals to 4 pr .Regarding the parameters of each algorithm, we list themin Table 1. Notably, we make t ss adaptive using [22] be-cause a ﬁxed value can be hardly turned towards featurematches with different qualities. The thresholds in NNSRand SI algorithms are kept consistent to those in their origi-nal papers. For ST and GC algorithms, their thresholds aredetermined via tuning experiments. In RANSAC, consider-ing the magnitude of the initial correspondence set, 10000loops are assigned to strike a balance between effectivenessand efﬁciency.All the algorithms are implemented in C++ with the helpof point cloud library (PCL) [27], using a PC equipped witha 3.4GHz processor and 24GiB memory.

4. Experimental Results

This section provides the outcomes of each evaluated al-gorithm (Sect. 2) on the experimental datasets using the pro-tocols in Sect. 3. The assessed terms including: robustnessto noise, varying point densities, clutter, occlusion, partialoverlap, varying threshold (cid:15) (c.f. Eq. 12), varying sizes ofthe initial correspondence set, and computational cost.

Robustness to noise.

Noise is expected to have an impacton the discriminative power of the feature descriptor, thuscreating a certain amount of false matches. While in re-trieval context, the ratio of inliers is generally high becausethe models used in the B3R dataset are of rich geometric in-formation. The result in such context is shown in Fig. 3(a).As witnessed by the ﬁgure, RANSAC and SI appearto be the best two ones among all evaluated proposals,considering their overall precision and recall performance.An interesting ﬁnding is that NNSR even surpasses GC and3DHV in cases with extreme noise in terms of precision.That is because NNSR prefers to select distinctive corre-spondences, which is peculiar sufﬁcient in this dataset asthe models possess wealthy distinctive structures. Whilethe recall of NNSR stays lower than other algorithmsexcept SS. The SI algorithm, with pleasurable performance when the standard deviation of Gaussian noise is less than0.15 pr , meets a signiﬁcant deterioration when the noiseturns further severe, indicating its sensitivity to high levelsof Gaussian noise. Robustness to point density variation.

Similar tonoise, this term also affects a descriptor’s distinctiveness.We present the result under varying point densities inFig. 3(b).We can observe that the behaviors of these algorithmsunder the effect of point density variation are analogous tothose under the challenge of noise. For instance, RANSACand ST again give the best overall performance, followedby NNSR, 3DHV and GC. Yet, the difference behind isthat SS even outperforms SI when the downsampling ra-tio reaches 0.3 regarding precision. While the recall per-formance of both NNSR and SI drops dramatically in low-resolution case. That is because SHOT is sensitive to vary-ing point densities [32], making the feature be weakly dis-tinctive (e.g., NNSR’s principle) for data with high ratiosof resolution decimation. While the reason for SI is thatSHOT’s LRF (e.g., the component in the global voting stagefor SI) is less repeatable when faced with data resolutionvariation [13].

Robustness to clutter.

The degree of clutter, as deﬁnedin [20], is the percentage of non-model surface patch areain the scene. Surface patches in the clutter area with simi-lar geometric properties to the patches in the model wouldcause outliers during feature matching. The result withquantized levels of clutter is shown in Fig. 3(c).A clear degrade of performance can be found for allalgorithms, as 3D object recognition scenario is morechallenging than retrieval [12]. When the degree of clutteris less than 75%, RANSAC achieves the best precision per-formance. As the degree of clutter further increases, 3DHVgives the best performance. Notably, the ST algorithm,with top-ranked performance on the B3R dataset, performsquite poor on the U3OR dataset. That is because ST tries toﬁnd large isometry-maintained clusters, which rarely exitin scenes with high percentages of clutter. In terms of recallperformance, SS, SI and GC perform better than others.Weighing up both precision and recall, 3DHV and GC aretwo most superior algorithms under the effect of clutter.

Robustness to occlusion.

Occlusion would result toincomplete shape patches, imposing great challenges foraccurate feature description. The degree of occlusion isgiven as the ratio of occluded model surface patch to thetotal model surface area [20, 19].As shown in Fig. 3(d), when the degree of occlusion issmaller than 70%, RANSAC is the best one regarding pre- tandard deviation of noise (pr) P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Standard deviation of noise (pr) R e c a ll (a) Noise Downsampling ratio P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Downsampling ratio R e c a ll (b) Point density variation Clutter (%)

65 70 75 80 85 90 95 P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Clutter (%)

65 70 75 80 85 90 95 R e c a ll (c) Clutter Occlusion (%)

60 65 70 75 80 85 90 P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Occlusion (%)

60 65 70 75 80 85 90 R e c a ll (d) Occlusion Overlap (%)

90 80 70 60 50 40 30 P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Overlap (%)

90 80 70 60 50 40 30 R e c a ll (e) Partial overlap Judging threshold (pr) P r e c i s i on SSNNSRSTRANSACGC3DHVSI

Judging threshold (pr) R e c a ll (f) Threshold (cid:15)

500 1000 1500 2000 2500 3000 P r e c i s i on SSNNSRSTRANSACGC3DHVSI

500 1000 1500 2000 2500 3000 R e c a ll (g) Number of initial correspondences

500 1000 1500 2000 2500 3000 T i m e ( s ) -3 -2 -1 SSNNSRSTRANSACGC3DHVSI (h) Time efﬁciency

Figure 3. (a-g):

Precision and

Recall performance of seven correspondence grouping algorithms with respect to different nuisances onthe experimental datasets. (h): Time efﬁciency performance regarding different sizes of the initial correspondence set, where the y -axis islogarithmic for best view. cision performance. As the occlusion degree increases to75%, GC outperforms RANSAC to be the best. GC, SI and3DHV eventually surpass other algorithms when the occlu-sion degree exceeds 75%. As for the recall performance, SIoutperforms all other algorithms for all levels of occlusion,especially in highly occluded scenes. SS and ST remain tobe two poorly performed algorithms in this test. We can infer that consistency-based algorithms, such as RANSACand GC, are more suitable for scenes with occlusion. Whilealgorithms relying on initial feature matching score, e.g.,SS and SI, are of high-risk in dealing with false matches, asthe feature matching score measured from occluded scenepatches can be suspicious. .3. Performance on the U3M Dataset Robustness to partial overlap.

The U3M dataset providesmatching pairs with various degrees of overlap. The degreeof overlap is measured as the ratio of the number of corre-sponding vertices to the minimum number of vertices of twoshapes [21]. The result with respect to different overlappingdegrees is presented in Fig. 3(e).Common to all algorithms is that their performance gen-erally degrades as the degree of overlap drops. This is ow-ing to the fact that the ratio of outliers in the initial cor-respondence set is closely correlated to the ratio of over-lapping regions. As for precision performance, RANSACgenerally exceeds the others for all levels of overlapping de-grees by a large margin in the range of 60% to 80% overlap-ping degree. GC and ST behave comparable, followed by3DHV, NNSR, SI and SS. Regarding recall performance, SIis superior to others especially when the degree of overlapis smaller than 70%. (cid:15)

As afore deﬁned in Eq. 12, the threshold (cid:15) determinesthat to what extend we judge a correspondence as an in-lier. We hereby change this threshold (the default settingis 4 pr ) to examine the performance variation of the eval-uated algorithms. Speciﬁcally, we conduct this experimenton the whole U3OR dataset, with outcomes being presentedin Fig. 3(f).As expected, all algorithms attain higher precision re-sults with looser (cid:15) . In particular, GC and RANSAC respec-tively reach the highest precision when the threshold (cid:15) is inthe range of [2 pr ,5 pr ] and [6 pr ,10 pr ]. The precision of SSimproves faintly, indicating the majority of its judged in-liers deviates a lot from the ground truth inliers. In termsof the recalled inliers, SI and SS present an increasing trendas the threshold gets large, while the performance of otheralgorithms remains almost unchanged. Various numbers of initial correspondences are desiredrespecting different applications, such as dense matchingfor shape morphing [1] and sparse matching for crude scanalignment [26]. Towards this end, we test the performanceof these algorithms with respect to different numbers ofinitial correspondences on the U3OR dataset, as shown inFig. 3(g).The ﬁgure suggests, that different algorithms give dif-ferent responses when varying the number of initial featurematches. The performance of some algorithms, e.g., GC,RANSAC and 3DHV, ﬂuctuates as the number of initialmatches augments. Meanwhile, one can ﬁnd that the sizeof initial correspondence set has a relatively strong impacton the SI and ST algorithms. To be more speciﬁc, when (a)(b)(d)(e)(f)(g)(c)

Figure 4. Exemplar visual results of the evaluated correspondencegrouping algorithms, i.e., (a) SS, (b) NNSR, (c) ST, (d) RANSAC,(e) GC, (f) 3DHV and (g) SI, respectively from the B3R, U3ORand U3M datasets (from left to right). the number of initial correspondences is smaller than 1000,these two algorithms produce low precision. However, asthe initial feature matches become dense, i.e., more than1000 correspondences, the precision performance of SI andST climbs quickly. Note that the SI algorithm even reachesthe second best precision with about 3000 initial correspon-dences. This is owing to the reason that dense initial corre-spondences could provide more reliable components in thelocal consolidation voting set for SI. Still, SI achieves thebest recall performance under all tested sizes of the initialcorrespondence set, surpassing all others by a large gap.

In addition to above precision and recall results, we testthe computational efﬁciency of each evaluated algorithmwith respect to different sizes of the input feature matches.he deployment of this experiment is as follows. First, theNMS radius of the Harris 3D keypoint detector is varied toobtain different quantities of initial feature matches. Sec-ond, these initial correspondence sets are fed to the evalu-ated algorithms and their computational costs are recorded.Finally, we repeat the former stage 10 times to eliminaterandomness and the average timing results are collected, asshown in Fig. 3(h).One can make several observations from the results.First, ST and RANSAC are two most time-consuming ones,especially for large initial correspondence sets. The rea-son for ST is that the computational cost for solving theprinciple eigenvector of an n × n matrix increases dramati-cally when the order n (i.e., the size of the initial correspon-dence set) gets larger. The explanation for RANSAC is thatRANSAC requires a huge amount of iterations to promiseacceptable result, while each iteration takes the whole cor-respondence set into consideration for computing currentinliers. Second, NNSR and 3DHV are two most efﬁcientones. We remark that as SS employs an adaptive threshold-ing strategy [22] in our implementation, its time cost there-fore turns to be more expensive than NNSR. The core opera-tion manner of 3DHV is coordinate transformation, it there-fore requires very few run time even for thousands of initialcorrespondences. Third, GC and SI are middle-ranked onesamong all evaluated algorithms in terms of run time. Themain timing cost of GC is dedicated to compute the distanceconstrains of all correspondence pairs, while SI needs bothlocal and global consolidations to judge the correctness of acorrespondence. We ﬁnally provide some visual results of the evaluatedcorrespondence grouping algorithms in Fig. 4. From theﬁgure, we can percept some visual differences of these out-comes. For instance, the number of outliers in the resultsof the two baseline algorithms, i.e., SS and NNSR, is rel-atively large except on the B3R dataset. This veriﬁes thatalgorithms relying on feature matching score are very sensi-tive to nuisance directly affecting a feature’s discriminativeability, e.g., clutter, occlusion, holes and etc. Another obser-vation is that with different grouping principles, the numberas well as the spatial locations of the results of these algo-rithms generally differ from each other.

5. Conclusions

This paper has presented a thorough evaluation of3D correspondence grouping algorithms on a variety ofdatasets. The evaluated terms including the precision andrecall performance under various levels of noise, point den-sity variation, clutter, occlusion, partial overlap, inlier judg-ing threshold, the size of initial feature matches, and com-putational efﬁciency. In light of these evaluation outcomes, we summarize the ﬁndings of this paper into several pointsas follows. • SS and NNSR, as two baselines relying on featurematching similarity, is very sensitive to disturbancesincluding clutter, occlusion and partial overlap. Givenhigh quality shapes with rich geometric structures,NNSR can be an effective option which also affordsreal-time performance. • The ST algorithm is effective for correspondence setwith very high amount of inliers, while its perfor-mance degrades dramatically under challenging cir-cumstances, e.g., 3D object recognition and 2.5Dview matching. Also, ST is shown to be very time-consuming, especially for large-scale correspondenceproblems. • RANSAC shows superior precision performance un-der a variety of nuisances, at the expense of relativelylong execution time. Hence, RANSAC is suitable foroff-line applications relying sparse matching such asscan registration and 3D modeling. • • For applications requiring dense feature correspon-dences, SI would be the best choice. A core shortcom-ing of SI is its limited precision under the nuisances ofclutter and partial overlap. GC, in this context, can bean alternative which shows overall higher precision.It is noteworthy that although existing algorithms workwell under retrieval context even with severe noise anddata resolution decimation, their performance is quitelimited under clutter, occlusion and partial overlap. Webelieve the research should towards the development ofrobust correspondence grouping algorithms for 3D objectrecognition and point cloud registration applications.

Acknowledgment.

The authors would like to ac-knowledge the Standford 3D Scanning Repository, theUniversity of Western Australia and the University ofBologna for providing their datasets. We also thank Dr.Buch for sharing the code to us. This work is jointlysupported by the National High Technology Research andDevelopment Program of China (863 Program) under Grant2015AA015904 and the 2015 annual foundation of ChinaAcademy of Space Technology (CAST). eferences [1] M. Alexa. Recent advances in mesh morphing. In

ComputerGraphics Forum , volume 21, pages 173–198. Wiley OnlineLibrary, 2002.[2] P. J. Besl and N. D. McKay. Method for registration of 3-dshapes.

IEEE Transactions on Pattern Analysis and MachineIntelligence , 14(2):239–256, 1992.[3] E. Boyer, A. Bronstein, M. Bronstein, B. Bustos, T. Darom,R. Horaud, I. Hotz, Y. Kelle, J. Keustermans, A. Kovnatsky,et al. Shrec 2011: Robust feature detection and descriptionbenchmark. In

Proceedings of the Eurographics Workshopon 3D Object Retrieval , 2011.[4] M. Brown and D. G. Lowe. Automatic panoramic imagestitching using invariant features.

International Journal ofComputer Vision , 74(1):59–73, 2007.[5] A. G. Buch, Y. Yang, N. Kr¨uger, and H. G. Petersen. Insearch of inliers: 3d correspondence by local and global vot-ing. In

Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition , pages 2075–2082. IEEE,2014.[6] H. Chen and B. Bhanu. 3d free-form object recognition inrange images using local surface patches.

Pattern Recogni-tion Letters , 28(10):1252–1262, 2007.[7] M. Cho, J. Lee, and K. M. Lee. Feature correspondence anddeformable object matching via agglomerative correspon-dence clustering. In

Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition , pages 1280–1287. IEEE, 2009.[8] M. Cho, J. Sun, O. Duchenne, and J. Ponce. Finding matchesin a haystack: A max-pooling strategy for graph matching inthe presence of outliers. In

Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition , pages2083–2090, 2014.[9] O. Chum and J. Matas. Matching with prosac-progressivesample consensus. In

Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition , volume 1,pages 220–226. IEEE, 2005.[10] O. Enqvist, K. Josephson, and F. Kahl. Optimal correspon-dences from pairwise constraints. In

Proceedings of theIEEE International Conference on Computer Vision , pages1295–1302. IEEE, 2009.[11] M. A. Fischler and R. C. Bolles. Random sample consen-sus: a paradigm for model ﬁtting with applications to imageanalysis and automated cartography.

Communications of theACM , 24(6):381–395, 1981.[12] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. M.Kwok. A comprehensive performance evaluation of 3d lo-cal feature descriptors.

International Journal of ComputerVision , 116(1):66–89, 2016.[13] Y. Guo, F. Sohel, M. Bennamoun, M. Lu, and J. Wan. Ro-tational projection statistics for 3d local surface descriptionand object recognition.

International Journal of ComputerVision , 105(1):63–86, 2013.[14] Y. Guo, F. Sohel, M. Bennamoun, J. Wan, and M. Lu. Anaccurate and robust range image registration algorithm for3d object modeling.

IEEE Transactions on Multimedia ,16(5):1377–1390, 2014. [15] A. E. Johnson and M. Hebert. Surface matching for objectrecognition in complex three-dimensional scenes.

Image andVision Computing , 16(9):635–651, 1998.[16] M. Leordeanu and M. Hebert. A spectral technique for cor-respondence problems using pairwise constraints. In

Pro-ceedings of the International Conference on Computer Vi-sion , volume 2, pages 1482–1489. IEEE, 2005.[17] D. G. Lowe. Distinctive image features from scale-invariantkeypoints.

International Journal of Computer Vision ,60(2):91–110, 2004.[18] S. Mahamud, L. R. Williams, K. K. Thornber, and K. Xu.Segmentation of multiple salient closed contours from realimages.

IEEE Transactions on Pattern Analysis and MachineIntelligence , 25(4):433–444, 2003.[19] A. Mian, M. Bennamoun, and R. Owens. On the repeata-bility and quality of keypoints for local feature-based 3d ob-ject retrieval from cluttered scenes.

International Journal ofComputer Vision , 89(2-3):348–361, 2010.[20] A. S. Mian, M. Bennamoun, and R. Owens. Three-dimensional model-based object recognition and segmenta-tion in cluttered scenes.

IEEE Transactions on Pattern Anal-ysis and Machine Intelligence , 28(10):1584–1601, 2006.[21] A. S. Mian, M. Bennamoun, and R. A. Owens. A novelrepresentation and feature matching algorithm for automaticpairwise registration of range images.

International Journalof Computer Vision , 66(1):19–40, 2006.[22] N. Otsu. A threshold selection method from gray-level his-tograms.

Automatica , 11(285-296):23–27, 1975.[23] A. Petrelli and L. Di Stefano. Pairwise registration by lo-cal orientation cues. In

Computer Graphics Forum . WileyOnline Library, 2015.[24] E. Rodol`a, A. Albarelli, F. Bergamasco, and A. Torsello. Ascale independent selection process for 3d object recogni-tion in cluttered scenes.

International Journal of ComputerVision , 102(1-3):129–145, 2013.[25] R. B. Rusu, N. Blodow, and M. Beetz. Fast point featurehistograms (fpfh) for 3d registration. In

Proceedings of theIEEE International Conference on Robotics and Automation ,pages 3212–3217, 2009.[26] R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. Align-ing point cloud views using persistent feature histograms. In

Proceedings of the IEEE/RSJ International Conference onIntelligent Robots and Systems , pages 3384–3391, 2008.[27] R. B. Rusu and S. Cousins. 3d is here: Point cloud library(pcl). In

Proceedings of the IEEE International Conferenceon Robotics and Automation , pages 1–4, 2011.[28] S. Salti, F. Tombari, and L. Di Stefano. On the use of implicitshape models for recognition of object categories in 3d data.In

Proceedings of the Asian Conference on Computer Vision ,pages 653–666. Springer, 2010.[29] J. Shi and J. Malik. Normalized cuts and image segmenta-tion.

IEEE Transactions on Pattern Analysis and MachineIntelligence , 22(8):888–905, 2000.[30] I. Sipiran and B. Bustos. Harris 3d: a robust extension of theharris operator for interest point detection on 3d meshes.

TheVisual Computer , 27(11):963–976, 2011.31] F. Tombari and L. Di Stefano. Object recognition in 3dscenes with occlusions and clutter by hough voting. In

Pro-ceedings of the Fourth Paciﬁc-Rim Symposium on Image andVideo Technology , pages 349–355. IEEE, 2010.[32] F. Tombari, S. Salti, and L. Di Stefano. Unique signatures ofhistograms for local surface description. In

Proceedings ofthe European Conference on Computer Vision , pages 356–369. 2010.[33] F. Tombari, S. Salti, and L. Di Stefano. Performance evalua-tion of 3d keypoint detectors.

International Journal of Com-puter Vision , 102(1-3):198–220, 2013.[34] P. H. Torr and A. Zisserman. Mlesac: A new robust estimatorwith application to estimating image geometry.

ComputerVision and Image Understanding , 78(1):138–156, 2000.[35] H. P. VC. Method and means for recognizing complex pat-terns, 1962. US Patent 3,069,654.[36] J. Yang, Z. Cao, and Q. Zhang. A fast and robust localdescriptor for 3d point cloud registration.