[PDF] Non-Rigid Point Set Registration Networks

Abstract

Point set registration is defined as a process to determine the spatial transformation from the source point set to the target one. Existing methods often iteratively search for the optimal geometric transformation to register a given pair of point sets, driven by minimizing a predefined alignment loss function. In contrast, the proposed point registration neural network (PR-Net) actively learns the registration pattern as a parametric function from a training dataset, consequently predict the desired geometric transformation to align a pair of point sets. PR-Net can transfer the learned knowledge (i.e. registration pattern) from registering training pairs to testing ones without additional iterative optimization. Specifically, in this paper, we develop novel techniques to learn shape descriptors from point sets that help formulate a clear correlation between source and target point sets. With the defined correlation, PR-Net tends to predict the transformation so that the source and target point sets can be statistically aligned, which in turn leads to an optimal spatial geometric registration. PR-Net achieves robust and superior performance for non-rigid registration of point sets, even in presence of Gaussian noise, outliers, and missing points, but requires much less time for registering large number of pairs. More importantly, for a new pair of point sets, PR-Net is able to directly predict the desired transformation using the learned model without repetitive iterative optimization routine. Our code is available at this https URL.

Full PDF

11 Non-Rigid Point Set Registration Networks

Lingjing Wang, Jianchun Chen, Xiang Li and Yi Fang

Abstract —Point set registration is deﬁned as a process todetermine the spatial transformation from the source point setto the target one. Existing methods often iteratively search forthe optimal geometric transformation to register a given pairof point sets, driven by minimizing a predeﬁned alignment lossfunction. In contrast, the proposed point registration neuralnetwork (PR-Net) actively learns the registration pattern as aparametric function from a training dataset, consequently predictthe desired geometric transformation to align a pair of pointsets. PR-Net can transfer the learned knowledge (i.e. registrationpattern) from registering training pairs to testing ones withoutadditional iterative optimization. Speciﬁcally, in this paper, wedevelop novel techniques to learn shape descriptors from pointsets that help formulate a clear correlation between source andtarget point sets. With the deﬁned correlation, PR-Net tendsto predict the transformation so that the source and targetpoint sets can be statistically aligned, which in turn leads toan optimal spatial geometric registration. PR-Net achieves robustand superior performance for non-rigid registration of point sets,even in presence of Gaussian noise, outliers, and missing points,but requires much less time for registering large number of pairs.More importantly, for a new pair of point sets, PR-Net is ableto directly predict the desired transformation using the learnedmodel without repetitive iterative optimization routine. Our codeis available at https://github.com/Lingjing324/PR-Net.

I. I

NTRODUCTION

A. Background

Over past decades, point set matching and registration isone of the most important computer vision tasks [1]–[8, 28],serving a widespread applications such as stereo matching,medical image registration, large-scale 3D reconstruction, 3Dpoint cloud matching, semantic segmentation and so on [9]–[14]. The point set registration is mathematically deﬁned asa process to determine the spatial geometric transformations(i.e. rigid and non-rigid transformation) that can optimallyregister the source point set to the target one. The desiredregistration algorithm can ﬁnd both rigid (i.e. rotation, reﬂec-tion, and shifting) and non-rigid (i.e. dilation and stretching)transformations, as well as being robust to outliers, Gaussianpoint drift, data incompleteness and so on.To formulate the problem of point set registration, existingmethods [5, 7] often iteratively search the optimal geometrictransformation to register two sets of points, driven by min-imizing a predeﬁned alignment loss function. The alignmentloss is usually pre-deﬁned as a certain type of distance metric

L.Wang is with MMVC Lab, the Department of Mathematics,New York University, New York, NY, 30332 USA e-mail:[email protected]. J.Chen is with the MMVC Lab, NewYork University, New York, NY, 30332 USA e-mail: [email protected] is with the MMVC Lab, New York University, New York, NY, 30332USA e-mail: [email protected]. Y.Fang is with MMVC Lab, Dept.of ECE, NYU Abu Dhabi, UAE. Dept. of ECE, NYU Tandon School ofEngineering, USA. USA e-mail: [email protected] author. Email: [email protected] (e.g. Euclidean distance loss) between the transformed sourcepoint set and the target one. Previous efforts [5, 7, 15] haveachieved great success in point set registration through the de-velopment of a variety of optimization algorithms and distancemetrics as summarized in [15]. However these methods areoften not designed to handle the real-time point set registrationor to deal with a large volume dataset. This limitation ismainly contributed by the fact that, for each given pair of pointsets, the iterative method needs to start over a new iterativeoptimization process even for the trivial similar cases. Thisobservation suggests that the existing efforts are mainly con-centrated on the stand-alone development of the optimizationstrategies rather than the techniques to smartly transferring theregistration pattern acquired from aligning one pair to another.This triggers the motivation to develop our proposed PR-Netwith the hope to actively learn the registration pattern froma set of training data, consequently, to adaptively utilize thatknowledge to directly predict the geometric transformation fora new pair of unseen point sets. As a result, PR-Net is capableof handling the real-time point set registration or a largevolume datasets with a similar pattern. To better understandthe point set registrations, we brieﬂy review related works asfollows.

B. Related Works

Iterative registration methods.

Current mainstream point setregistration methods focus on the development of optimiza-tion algorithms to estimate the rigid or non-rigid geometrictransformations in an iterative routine. With the assumptionthat a pair of point sets are related by a rigid transformation,a registration approach is to estimate the best translationand rotation parameters in the iterative search routine aimingto minimize a distance metric between two sets of points.One of the most popular methods for rigid registration, theIterative Closest Point (ICP) algorithm [11], was proposed tohandle point set registration with least-squares estimation oftransformation parameters. ICP starts with an initial estimationof rigid transformation, followed by iteratively reﬁning thetransformation by alternately choosing corresponding pointsfrom the point sets as estimate transformation parameters. TheICP algorithm is reported to be vulnerable to the selection ofcorresponding points for initial transformation estimation, andalso incapable of dealing with non-rigid transformation.To accommodate the deformation (e.g. morphing, articula-tion) between a pair of point sets, many efforts were spent inthe development of algorithms to address the challenges of anon-rigid transformation. Chui and Rangarajan [16] proposeda robust method to model non-rigid transformation namedas thin-plate spline [17]. They proposed TPS-RSM algorithmwith penalization on second order derivatives to optimize a r X i v : . [ c s . G R ] A p r the parameters of the desired transformation. Ma et al. [18]introduced a L E estimator for non-rigid registration forhandling signiﬁcant scale changes and rotations. In addition,Myronenko et al. [28] proposed non-parametric coherencepoint drift (CPD) algorithm which leverages Gaussian mixturelikelihood and penalizes derivatives of all orders of the velocityﬁeld to enforce velocity coherence so that centroids of sourcepoint set move coherently to target point set. They reportedthat their algorithm can be easily extended to N-dimensionalspace compared to TPS-RSM algorithm. Ma et al. [7] proposeda non-parametric vector ﬁeld consensus algorithm to establishthe robust correspondence between two sets of points. Theirexperimental result demonstrated that the proposed methodis quite robust to outliers. In [1], the authors emphasized theimportance to preserve local and global structures for non-rigidpoint set registration. Wang et al. [19] proposed path followingstrategy for graph matching in order to improve the compu-tation efﬁcency. Zhou et al. [20] proposed a fast alternatingminimization algorithm for multi-image matching. Existingmethods have achieved great success for both rigid and non-rigid point set registration over past decades. However, theyare mainly concentrated on the stand-alone development ofthe optimization strategies for point set registration rather thanthe techniques to learn the registration process as a pattern. Inthis paper, the deﬁciency of these current algorithms drivesus to develop a learning-based registration paradigm that isable to actively learn the knowledge about how to register twopoint sets, consequently, to adaptively utilize those knowledgeto directly predict the geometric transformation without thenecessary to start over a new iterative search process for eachsimilar case. Learning-based registration methods.

Recent great successof deep learning in various computer vision ﬁelds [21]–[27] motivates researchers to start modeling the registra-tion problem using deep neural networks [24]–[27, 29, 30].Earlier attempt in this direction is mainly concentrated onthe development of learning-based registration methods forpairwise image registration. For example, Rocco et al. [29]developed a CNN architecture to predict both rigid and non-rigid transformation for 2D image matching. Balakrishnan etal. [30] proposed a deep learning method to predict the non-rigid deformation ﬁeld with application in deformable medicalimage registration. Both works share the common use of deeplearning for visual feature learning from image to formulatethe pairwise image correlations. The method presented in [29]tends to predict the parameters of TPS-based transformationfunction for pairwise image registration, while the authors in[30] aim to predict a smooth registration ﬁeld to approximatenon-rigid transformation. Though it is not a direct registrationmodel, Zeng et al. [27] proposed a volumetric 3D-CNN tolearn local shape descriptor geometric patch matching. Theaforementioned learning-based registration methods, despitenot working on point set registration, are encouraging for us totake a further step in this paper to investigate the possibility oflearning point set registration using deep neural networks. Wewill brieﬂy describe our proposed PR-Net in the subsectionbelow, and the technical details are discussed in the approachsection.

C. Our Solution: Point Set Registration Neural Network (PR-Net)

Different from image data with a regular grid, point clouddata is often recorded in an irregular and disordered format.Learning the point set registration requires the deep neuralnetworks to be applicable to irregular point cloud data. Inaddition, unlike the image containing rich texture and colorinformation, the point cloud is solely represented with geo-metric information (i.e. coordinates, curvature, normal). Thissuggests that a learning-based solution for point set registrationneeds to address two main technical challenges: 1) robustlearning of both local and global geometric feature frompoint clouds and 2) robust learning of the transformation fromwell-deﬁned correlation measure between pairwise geometricfeature sets. Therefore, the proposed PR-Net investigates twomajor research problems: 1) the design of the techniques forpoint cloud learning by introducing a novel reference operatorto enable formulating the correlation measure on arbitrary-structured data, and 2) the development of learning paradigmfor the geometric transformation learning from pairwise fea-ture sets.Figure 1 illustrates the pipeline of the proposed PR-Net which is composed of three main components. Theﬁrst component is “learning shape descriptor tensor”.In this component, the proposed grid-reference structureis developed to enable feature learning and formulatethe correlation relationship on arbitrary-structured data.The second component is “learning shape correlationtensor”. In this component, the shape correlation tensor isdeveloped as a metric to further evaluate the correlationbetween two shape descriptor tensors of point sets tobe registered. The shape correlation tensor is formulatedas “all-to-all” point-wise computation from the pair ofshape descriptor tensors evaluated in the ﬁrst component.The third component is “learning of the parameters oftransformation”. In this component, we exploit the functionmapping between space of the “shape correlation tensor”and “the parameters of transformation” to determine thebest geometric transformation that statistically aligns thesource point cloud set and the target one. In this paper,PR-Net utilizes the CNN as functional regression modelto approximate the aforementioned mapping function forthe parameters learning of the desired transformation.Accordingly, the main components of the pipeline indicatethe main contributions of our proposed PR-Net as follows: • We propose a novel technique to learn the global andlocal shape aware “shape descriptor tensor” directly fromthe point cloud with irregular and disordered format.The shape descriptor tensor is proved to be effective andefﬁcient in extracting the geometric shape features, evenfor point cloud in presence of missing points, noise, andoutliers. • We propose a novel shape correlation tensor tocomprehensively evaluate the correlation between twopoint sets to be registered.

Fig. 1. PR-Net pipeline. The proposed PR-Net includes three parts: learning shape descriptor tensor (SCT), learning correlation tensor, and shape transformationprediction. For a pair of source point set S i and target point set G j , we ﬁrst generate two reference grids and map points of source and target point setson them as two shape descriptor tensor F s and F g . We deﬁne the shape correlation tensor C between the source and target shape descriptor tensors. Byleveraging 2D-CNN, we learn the desired parameters θ of transformation T θ based on the shape correlation tensor. The learned optimal model transformssource point set to be statistically aligned with the target point set. • We propose a novel statistical alignment loss functionthat drives our structure to determine the optimalgeometric transformation that statistically aligns thesource point cloud set and the target one. • In all, we propose a novel learning-based point setregistration paradigm which learns registration patternsfrom training data, consequently, to adaptively utilizethat knowledge to directly predict the geometrictransformation for aligning a new pair of point sets,without the necessity to start over a new iterative searchprocess.In conclusion, given a large number of data set for training,PR-Net demonstrates a stable generalization ability to directlypredict the desired non-rigid transformation for the unseenpoint clouds data even in presence of a great level of noise,missing points, and outliers.II. A

PPROACH

We introduce our approach in the following sections. Insection 2 A, we state our learning-based registration problem.From section 2 B to 2 D, four successive parts are illustratedto explain each module of our method in details. Section 2B illustrates our structure for learning shape descriptor tensorfor point sets. In section 2 C, we introduce shape correlationtensor based on the learned shape descriptors. The non-rigidshape transformation prediction is introduced in section 2 D.The deﬁnition of the loss function is discussed in section 2 E and the settings of the training and model conﬁguration areexplained in section 2 F.

A. Problem statement

Prior to discussion of our approach, we ﬁrst deﬁne thepoint set registration task. Let the training data set D = { ( S i , G j ) ,where S i , G j ⊂ R N } . We denote S i source pointset and G j target point set. In this paper, we mainly dis-cuss the situation when N = 2 and N = 3 . We assume ∀ ( S i , G j ) ∈ D , ∃ θ i , T θ i : R N → R N , such that , T θ i : x i → x (cid:48) i where x i ∈ S i and x (cid:48) i ∈ G j . T θ i can be rigid or non-rigidtransformation with parameters θ i . For previous methods, θ i isoptimized in a iterative searching process to optimally align agiven target and source point sets. For our method, we assumethe existence of a neural network structure g with a set of allits weights γ , such that g γ ( S i , G j ) = θ i . Our optimizationtask becomes: γ optimal = argmin γ [ E ( S i , G j ) ∼ D [ L ( T g γ ( S i , G j ) ( S i ) , G j )]] , (1)Therefore, for a given training set D , our task is to optimizeparameters γ instead of θ / T θ . The desired θ / T θ is our model’soutput. L ( · ) represents a similarity measure. B. Learning shape descriptor tensor

The ﬁrst part of our structure is learning the shape descriptorfor point sets. To address the problem of irregular format ofpoint set, we introduce two point grids M S i and M G j as reference point sets, which are overlaid on the source pointset S i and the target point set G j respectively.For each point in the reference point sets, we learn a shapedescriptor tensor F is or F jg by mapping the local and globalinformation of non-regular source or target point set on it.Speciﬁcally, as shown in Figure 2, taking S i for example, ∀ x i ∈ M S i , x i is 2D/3D geometric coordinates and we deﬁnethe single layer mapping U : ( x i , S i ) → R d as following: U ( x i , S i ) = Maxpool { ReLU ( u m [ x i , y i ] + c m )) } y i ∈ S i (2),where parameters u m ∈ R m × / , c m ∈ R m × and [*,*]means concatenation. For multi-layers’ structure, we repeatthe linear combination and Leaky-ReLU [31] activation partsbefore applying the Max-pool layer. The MLP-based structurewas ﬁrstly introduced in PointNet [26] for directly learninggeometric features from point cloud. Please refer to PointNet[26] for more details. The single layer MLP-based function U (*) can be regarded as a mapping to exact features fromnon-regular point set, which is driven by the loss function. Inour case, we have three layers MLP. In this way, we transferinformation of source and target point sets to two shapedescriptor tensors on reference grids. We deﬁne the shapedescriptor tensor F iS and F jG . F iS , F jG ∈ R n × m × d where F iS =  U ( x , S i ) , U ( x , S i ) . . . U ( x , S i ) U ( x , S i ) U ( x , S i ) . . . U ( x , S i ) . . . U ( x n1 , S i ) U ( x d2 , S i ) . . . U ( x nm , S i )  , where x nm ∈ M S i . Similarly, we have the shape descrip-tor tensor F jG for M G j . Fig. 2. The schema of learning shape descriptor tensor process.

C. Shape correlation tensor

As shown in Figure 3, for the two source and targetgrid points M S i and M G j with shape descriptor tensors F iS = [ f ij = U ( x ij , S i )] and F iG = [ g ij = U ( x ij , G i )] , ournext step is to deﬁne the shape correlation tensor betweenthe input and target shape descriptor tensors. We deﬁne the shape correlation tensor in the following step. Let M be asimilarity metric, such that M : R d × R d → R . In this paper,we simply let M as inner product. ∀ f ij ∈ F iS , we sort the itspoint-wise correlation with elements in F jG as C ij ∈ R t and t = nm , where C ij = [ M ( f ij , g ) , M ( f ij , g ) , ..., M ( f ij , g md )] (3)We deﬁne C = [ C ij ] ∈ R n × m × t as the shape correlationtensor. It has t-dimensional channel to save the correlationinformation between each the point in M S i with all the pointsin M G j . We normalize each channel of element C ij in theshape correlation tensor. Fig. 3. The schema of formulating correlation tensor process.

D. Shape transformation prediction

Before we discuss shape transformation prediction, weﬁrstly review two classical parametric functions for rigidand non-rigid transformations. For afﬁne transformationincluding translation, scaling, rotation and shear. Let θ rigid = { α, r , r , r , r , s , s } and we have T θ =  r cos α r sin α s r sin α r cos α s  Even though we do not discuss the rigid case in this paper,our model can be easily adjusted for rigid registration.For non-rigid transformation, let θ nonrigid be the control-ling points in Thin Plate Spine. In this paper, we choose9/27 controlling points distributed as a × / × × grid for 2D/3D data. For a pair of 2D source and targetpoint sets, our target θ nonrigid = { ( θ , θ ) , ..., ( θ , θ ) } ,are a set of coordinates of nine controlling points inTPS [17]. Let the original controlling points in TPS be θ and θ = [(0 , , ( − , , ..., (1 , − . For a pair of3D source and target point sets, our target θ nonrigid = { ( θ , θ , θ ) , ..., ( θ , θ , θ ) } , are a set of coordinates ofnine controlling points in TPS [17]. Let the original controllingpoints in TPS be θ and θ D = [(0 , , ( − , , ..., (1 , − and θ D = [(0 , , , ( − , , , (1 , , , ..., (1 , , . Afterachieving new positions of controlling points θ nonrigid , to-gether with θ , we can solve the non-rigid transformation T θ according to TPS. In this case, we have 18/81 parameters to beoptimized for deﬁning the non-rigid transformation to align the2D/3D source and target point sets. For a given pair of sourcepoint set S i and target point set G j as inputs, based on theirshape correlation tensor C from the previous step, we furtheruse 2D-CNN/3D-CNN with a successive of fully connectedlayers to predict the desired parameters θ in transformation T θ . E. From statistical alignment to loss functions

The last step is to deﬁne the loss function between thetransformed source point set T θ ( S i ) and the target point set G j . Due to the disorderliness of point cloud, there is no directcorresponding relationship between these two point sets.Therefore, a distance metric between two point sets insteadof point/pixel-wise loss used in image registration should bedesired. Besides, the suitable metric should be differentiableand efﬁcient to compute. For 3D point set generation, Fanet al. [33] ﬁrst proposed Chamfer Distance loss, which iswidely used in practice. Registration problem can be treatedas statistical alignment between two distributions of sourceand target point sets. We treat target point set as centroidsof a Gaussian Mixture Model and we ﬁt the transformedsource point set as data into this GMM model so that we canmaximize the likelihood of the GMM. Chamfer Distance (C.D.).

Chamfer loss is a simple andeffective metric to be deﬁned on two non-corresponding pointsets. It dose not require the same number of points and inmany tasks and it provides high quality results in practice.We deﬁne the Chamfer loss on our transformed source pointset T θ ( S ) and target points set G as: L Chamfer ( T θ ( S ) , G | γ ) = (cid:88) x ∈ T θ ( S ) min y ∈ G || x − y || + (cid:88) y ∈ G min x ∈ T θ ( S ) || x − y || (4)where γ represents all the parameters in MLP layers and2D-CNN layers from section 2 B, 2 C and 2 D. In this paper,we use Chamfer Distance (C.D.) as evaluation metric. Gaussian Mixture Model (GMM) loss.

Let oursource point set S = ( x , x , ..., x N ) and transformedtarget point set T θ ( S ) = ( T θ ( x ) , T θ ( x ) , ..., T θ ( x N )) .The target point set is G = ( y , y , ..., y M ) where x i and y i ∈ R / R in our paper. We consider Gaussian-mixture model p ( T θ ( x i )) = (cid:80) Mm =1 1 M p ( T θ ( x i ) | m ) with x | m ∼ N ( y m , σ I ) , where our target point set acts asthe 2/3-dimensional centroids of equally-weighted Gaussianmixture model. In general we want our predicted point setto maximally satisfy the Gaussian Mixture model. Therefore, we deﬁne the loss function (GMM loss) as : L GMM ( T θ ( S ) , G | γ ) = − (cid:88) x ∈ S log (cid:88) y ∈ G e − (cid:13)(cid:13)(cid:13) Tθ ( x ) − y σ (cid:13)(cid:13)(cid:13) (5),where γ represents all the parameters in MLP layers and2D-CNN layers from section 2 B, 2 C, and 2 D. σ is thestandard deviation in GMM. We set σ to be identical foreach Gaussian distribution in GMM. σ is a hyper-parameterto choose in practice. Even though it is a constant for eachinput, we have more sophisticated strategy for choosing it inpractice as discussed in section 2 F. We use GMM loss as ourloss function in this paper. F. Model settings

We train our network using batch data form training dataset { ( S i , G i ) | ( S i , G i ) ∈ D } i =1 , ,...,b . b is the batch size andis set to 16. For learning the shape descriptor tensor in 2B, the input is N × /N × matrix and we use 4 MLPlayers with dimensions (16,32,64,128) and a Maxpool layerto convert it to a 128-dimensional descriptor for each pointin × reference grid. For the shape correlation tensor C discussed in 2 C and 2 D, we use three 2D-CNN/3D-CNN layers with kernel size (3,3),(4,4),(5,5) and dimension(128,256,512) with two successive fully connected layers withdimensions (64, 18)/(512,81). Learning rate is set as 0.0001with 0.995 exponential decay with Adam optimizer. We useleaky-ReLU [31] activation function and implement batchnormalization [34] for every layer except the output layer. Weuse deterministic annealing for the standard deviation σ whichis initially set to 1, and for each step n we reduce it to (cid:112) /n until a margin value of 0.1. Gradual reducing σ leads to acoarse-to-ﬁne match. For outlier and missing points case, weslightly increase the margin value to 0.12.III. E XPERIMENTS

In this section, we implement a set of experiments tovalidate the performance of our proposed PR-Net for non-rigid point set registration from different aspects (i.e. accuracyand time). In section 3 A, we discuss how we prepare theexperimental dataset. In section 3 B, we compare PR-Net withnon-learning based non-rigid point set registration method. Insection 3 C, we validate the robustness of PR-Net against thedifferent level of geometric deformation. In section 3 D, wevalidate the robustness of PR-Net against the different typesof noise. In section 3 E, we further verify that PR-Net canhandle registration tasks for various types of dataset.

A. Dataset preparation

The point cloud data is often featured with geometricstructural variations with presence of a variety of noise (e.g.outliers, missing points), which poses challenges for pointset registration. An effective registration solution should berobust to the presence of those noise to provide the desired

Methods CD TimeCPD (Train) [28] . ± . ∼

12 hoursPR-Net (Train) 0.0037 ± ∼

13 minutesCPD (test) [28] 0.0038 ± ∼

12 hoursPR-Net (Test) 0.0044 ± ∼ ERFORMANCE COMPARISON WITH

CPD

FOR REGISTERING k PAIRS OF POINT SETS AT DEFORMATION LEVEL . .Deform. Level Chamfer Distance0.2 0.0013 ± ± ± ± ± UANTITATIVE TESTING PERFORMANCE FOR FISH SHAPE POINT SET REGISTRATION AT DIFFERENT DEFORMATION LEVEL (D EFORM . L

EVEL ) geometric transformation. Therefore, in order to assess PR-Net’s performance, we simulate the commonly recognizednoise to the raw point sets to prepare the experimental data.To prepare the geometric structural variation, we randomlychoose a certain number of samples from the point set anduse them as the controlling points of a thin plate spline (TPS)transformation. A zero-mean Gaussian is superposed to eachcontrolling point to simulate a random drift from their originalpositions. The TPS is then applied to synthesize the deformedpoint set with different level of structural variation. The / of standard deviation of the above mentioned Gaussian isused to measure the deformation level. To prepare the positiondrift (P.D.) noise, we applied a zero-mean Gaussian to eachsample from the point set. The level of P.D. noise is deﬁnedas the standard deviation of Gaussian. To prepare the dataincompleteness (D.I.) noise, we randomly remove a certainamount of points from the entire point set. The level of D.I.noise is deﬁned as ratio of the eliminated points and the entireset. To prepare the data outlier (D.O.) noise, we randomly adda certain amount of points generated by a zero-mean Gaussianto the point set. The level of D.O. noise is deﬁned as theratio of the added points to the entire point set. For all tests,we use the Chamfer Distance (C.D.) between a pair of pointsets to provide a quantitative score to evaluate the registrationperformance. B. Comparison to Non-learning based Approach

Different from previous efforts, the proposed PR-Net isa learning-based non-rigid point set registration method,which can learn the registration pattern to directly predictthe non-parametric geometric transformation for the pointsets alignment. As a learning-based approach to predict thenon-rigid registration, it is not applicable to have a directcomparison between PR-Net and other existing non-rigiditerative registration methods. To compare our method tonon-learning based iterative method (i.e. Coherent Point Drift(CPD) [28]), we design the experiment as follows to assessboth time and accuracy performance.

Experimental Setup:

We conduct tests to compare PR-Net with the non-learning based approach. Coherent Point Drifts (CPD)[28] is a highly recognized non-rigid point setregistration method. In this test, we synthesize 2D deformedﬁsh data with deformation level of . to prepare k trainingdataset and k testing dataset. Our PR-Net is ﬁrstly trainedbefore applied to the k testing dataset. The CPD is directlyapplied to the k testing dataset. Result:

We list the experimental result in the table I. Thesecond column shows the mean and standard deviation of all k C.D. after registration. The third column shows the timeused for registering the k pairs of point sets. As we expect,after training PR-Net can perform the real-time non-rigid pointset registration. The time used to register k pairs of pointsets is around seconds, which is order of magnitude less thanthe time ( hours) consumed by CPD for point set registrationof the entire k dataset. This is because of the fact that CPDneeds to repeatedly start over a new iterative process for a newpair of point sets. PR-Net clearly gains advantage over thenon-learning based method by providing a faster solution tonon-rigid point registration. We also want to note that it takesaround minutes to train our PR-Net on the k datasetwith a comparative performance, which is also signiﬁcantlyless than hours used by CPD.In addition to the efﬁciency (registration speed), we are alsointerested in the effectiveness that indicates how well PR-Netcan generalize from training data to directly predict the desiredgeometric transformation for non-rigid point set registration.The comparative training and testing C.D. results are listedon the second column. The small difference between trainingand testing C.D. indicates a comparative small performancedegradation from training to testing. Furthermore, we noticethat C.D. of PR-Net has a smaller standard deviation thanthat of CPD, which suggests that PR-Net can provide a morestable registration as it obtains generalization ability to adaptproperly to previously unseen data. In contrast, the CPDtreats every new pair of point sets independently and has torepeatedly register them from the start. C. Robust to Geometric Deformation

In this experiment, we take a detailed investigation onhow well the PR-Net performs point set registration for

Fig. 4. Testing results for 2D ﬁsh shape point set registration at different deformation levels. The deformation level increases from 0.3 to 1.5 from left toright. The presented shapes are randomly selected from same testing batch. The blue shapes are source point sets and the red shapes are target point sets.Please zoom-in for better visualization.

2D shapes at different deformation levels. This experimentshows a basic assessment of our model’s performance andcapacity for registering unseen highly deformed testing shapes.

Experimental Setup:

We conduct tests to verify how wellPR-Net performs on the data with different levels of geometricdeformation. In this test, we synthesize 2D deformed ﬁsh datawith deformation levels from . to . to cover a good rangeof shape structural variation. The deformed 2D ﬁsh shapesare shown in Figure 4. For each level of deformation, wesimulate k point sets as target point sets for training andsimulate additional k point sets for testing. The quantitativeresult is shown in Table II. Result:

After training, the PR-Net is applied to register testingdatasets with different deformation levels. The quantitativeexperimental results are listed in Table II. The second columnlists the C.D. scores for a registered pair of source and target point sets with different deformation levels. As wecan see from the evaluation, PR-Net can achieve impressiveperformance on non-rigid point set registration when thedeformation level is less than . and the Chamfer Distanceremains as low as . as shown in Table II. However whenthe deformation level reaches . , there is a huge jump ofC.D. from . to . . This indicates that our model’sregistration capacity dose have a clear upper bond. Once thedeformation level reaches or higher than this upper bond,the performance of PR-Net can be dramatically reduced. Wefurther check the qualitative results for better understandingPR-Net’s performance.The corresponding qualitative results are demonstrated inFigure 4, which illustrates the pairs of point sets before andafter registration. From the Figure 4, we can clearly see that thetransformed source point set (in blue color) structurally alignswell with the target point set (in red color), which veriﬁesPR-Net’s registration capacity. Especially when deformation P.D. Level C.D. D.O. Level C.D. D.I. Level C.D.0.05 0.0052 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± UANTITATIVE TESTING PERFORMANCE FOR FISH SHAPE POINT SET REGISTRATION AT DIFFERENT DEFORMATION LEVEL . IN PRESENCE OFVARIOUS NOISE SUCH AS P OINT D RIFT (P.D)

NOISE , D

ATA O UTLIER (D.O.)

NOISE , AND D ATA I NCOMPLETENESS (D.I.)

NOISE .Fig. 5. Testing results for 2D ﬁsh shape point set registration at deformation level . in presence of various noise. (A) Performance in presence of DataIncompleteness (D.I.) noise. (B) Performance in presence of Point Drift (P.D.) noise. (C) Performance in presence of Data Outlier (D.O.) noise. Blue shapesare source point sets and red ones are target point sets. Please zoom-in for better visualization. level is equal or less then . , as shown in 4, PR-Net almostperfectly aligns the source and target point sets. As wementioned before, when the deformation level reaches . , thequantitative result experiences a dramatic drop. As displayedin Figure 4, for this deformation level . , the geometricstructure of 2D ﬁsh is signiﬁcantly deteriorated, which posesmuch more challenges in determining the desired geometrictransformation. Even for human beings, it is hard to tell thegeometric meaning of the target point sets (Red shapes inFigure 4). But this also indicates that TPS, as a parametricgeometric transformation model, might be limited in modelingthe large structural variation in our test. We further investigatemore complex geometric transformation model or model-freegeometric transformation in our separate research reports. D. Robust to Data Noise

While using the sensors such as LIDAR sensor and laserscanner, it is unavoidable that the data might be acquired witha variety types of noises. An effective non-rigid registrationmethod should be robust to those noise in addition to the structural variations as discussed in previous section.Therefore, in this section, we focus on testing how wellPR-Net can predict the non-rigid registration from the noisydataset.

Experimental Setup:

In this experiment, we carry out a setof tests to validate PR-Net’s performance against differenttypes of data noise including P.D. noise, D.I. noise, and D.O.noise. We simulate the noisy data through introducing threetypes of noise with ﬁve different levels to the target pointset at deformation level of . . The level of noise is deﬁnedin the section of data preparation. The Figure 5 illustratesthe noisy target point set (in red color) in contrast to thesource point set (in blue color). The quantitative result isdemonstrated in Table III. Result:

Figure 4 demonstrates the PR-Net’s performance withclean data for comparison. Given the source point set (in redcolor) and target point set (in blue color), PR-Net succeeds intransforming source point set to align with the target one for

Deform. Level 0.3 0.5 0.8 1.0Hand 0.0013 ± ± ± ± ± ± ± ± ± ± ± ± UANTITATIVE TESTING PERFORMANCE FOR SKULL , HAND , AND SKELETON SHAPES AT DIFFERENT DEFORMATION LEVEL FROM TO the clean data.For investigating PR-Net’s performance on noise data, inFigure 5 (A), we apply D.I. noise to target point set byincreasingly removing point samples as shown from left toright in a row. The registration results show that our PR-Netis capable of robustly aligning the source point set (red) withtarget (blue) in this condition. Even for the situation when D.I.noise level is . and the majority of target shape is missing,PR-Net can still align the remaining parts such as the topand tail of the target ﬁsh. An interesting observation is thatfor the missing parts, even without any target information,the transformed source point sets seem to be natural andpreserve the original geometric meaning. For example whenthe D.I. level reaches . , the transformed source point setsnot only match the targets, but the shape in general still hasthe geometric meaning for the missing parts and it can beeasily recognized as a “ﬁsh” shape. As shown in Table III,the quantitative result shows that C.D. linearly increases whenD.I. Level increases from . to . , which indicates PR-Net’s high resistance to D.I. Noise.In Figure 5 (B), we apply P.D. noise to target point setby increasingly adding Gaussian noise as shown from left toright in a row. As shown in Figure 5, though the positions oftarget point sets are dramatically drifted by Gaussian noise,our PR-Net still effectively predicts the desired geometric transformation. Especially when the P.D. noise level is higherthan . , even though the boundary of the ﬁsh shape is dra-matically drifted, the transformed source shapes have smoothboundary and acceptable alignment with the target ones. Fromthe quantitative results, as shown in Table III, the C.D. ofregistered pairs is less than . when the P.D. noise levelis under . , which indicates almost perfect alignment. TheD.O. noises is added to target point set in Figure 5 (C) asshown from left to right in a row. The registration resultdemonstrates that the alignment between the source and targetshapes is not signiﬁcantly affected by outlier points in targetset when the D.O. noise level is less than . . The quantitativeresults show that the C.D. of registered pairs remain as low as . when the D.O noise level is . . However, when theD.O. noise level reaches as high as . the C.D. of registeredpairs jumps from . to . , which indicates that PR-Net starts suffering dramatic performance degradation affectedby the large amount of added outliers. E. Results on Data Variety

In this experiment, we take a further step to investigatehow well the PR-Net performs point set registration forother 2D/3D shapes at different deformation levels. We areespecially interested in point set registration of non-contourbased 2D shapes, as well as 3D shapes since the 3D data Fig. 7. Testing registration performance for 3D face and cat point sets. The blue shapes are source shapes and the red shapes are target ones. We plot themesh of shapes for better visualization. have been gaining great attention in community with recentadvancements in 3D acquisition and computation resources.

Experimental Setup:

We further conduct tests to verifyhow well PR-Net performs on the dataset of various shapesand patterns, such as skeleton, skull, hand, face (3D shape)and cat (3D shape). For each type of dataset, with differentlevels of geometric deformation, we simulate k point setsas target point sets for training and simulate additional k point sets for testing. For 3D shapes, we randomly samplepoints from the mesh data set. While we training PR-Neton 3D shapes, we only sample out of K pointsfrom an input 3D model and × × controllingpoints for learning descriptor tensor and correlation tensor,which already provided reasonable registration. Due to thecomputation complexity, there is a clear trade-off betweenperformance with computation efﬁciency. We randomly selecta few samples at deformation level of . to visualize thealignment result in Figure 6 and Figure 7. We present thequantitative evaluation of registration in Table IV, whichpresents the mean and standard deviation of C.D. betweenregistered pairs. Result:

PR-Net demonstrates robust registration performancefor various categories of 2D shapes (e.g. skull, skeleton andhand), based on the selected examples from the testing datasetare demonstrated in Figure 6 and the corresponding quanti-tative testing results for comparison of these three differentshapes are shown in Table IV. The decreasement in C.D.values from pre to post registration suggests that PR-Netcan successfully align deformable pairs of various shapes. Asshown in Figure 6, for the current deformation level . , PR-Net shows robust performance regarding to different shapes.There is no obvious difference among the registration resultsof them. When zooming in for a more detailed observation,the missing registration part can still be noticed such as theupper line of the skull in row 2. As shown in Table IV, forcomparing the quantitative results on these three shapes, theresult on Skull Shape is slightly worse than other two shapeswhen deformation level is low. But for higher deformationlevel, the performance on Skull shape becomes comparativeto other two shapes. This validates the robust performance ofPR-Net towards non-rigid point set registration over a varietyof shapes in presence of different geometric deformation level. In Figure 7, we demonstrate that PR-Net is applicable for 3Dpoint set registration. As shown in Figure 7, for the generalpart of the target shape, our model can correctly predict theregistration transformation to align them. As to aligning themore subtle part of source and target point sets, there is stillspace to improve PR-Net’s performance. The straightforwardmethod to improve the performance is to increase the numberof sampling points from surface and as well as the controllingpoints for learning the shape descriptor tensor with accept-able computation cost. The comparison result across differentcategories of shapes indicates the consistent performance ofPR-Net. IV. C ONCLUSION & D

ISCUSSION

This paper introduces a novel learning-based approach toour research community for non-rigid point set registration. Incontrast to non-learning based methods (e.g. Coherent PointDrift), the learning based approaches for point set registrationare rarely studied (to the best of our knowledge, PR-Netmight be the ﬁrst work that can actually generalize fromtraining to predict geometric transformation for non-rigid pointset registration). Possible reasons behind are 1) the irregularformat of the point cloud data poses a challenge for standardlearnable operator (e.g. discrete convolutions) to operates overnon-grid structured data for point feature learning and 2) it isnot obvious to deﬁne an appropriate geometric transformationto transform source point set to the target one. The PR-Netprovides the shape descriptor tensor and correlation tensorfor the solution of feature learning, and uses the thin platespline to model the geometric transformation. Though PR-Net is capable of learning the point registration, there arestill some challenges that are left to be addressed. Firstly,our current PR-Net indirectly uses the regular grids to assistwith the shape feature learning. A continuous operator, whichcan directly be applied on point for feature learning, wouldbe more applicable for point registration. Secondly, PR-Netuses the TPS to model the geometric transformation. Thoughit predicts impressive registration performance for shapeswith moderate deformation, the unsatisfactory performance forshapes with large deformation motivates us to study a model-free geometric transformation (e.g. the displacement ﬁeld). Itwould also be of great interest to extend PR-Net to other datamodality such as 2D Image and 3D volumetric data. We willreport those research outcomes in separate papers. R EFERENCES[1] Jiayi and Zhao Ma Ji and Yuille,

Non-rigid point set registration bypreserving global and local structures , IEEE Transactions on imageProcessing (2016), no. 1, 53–64.[2] Bing and Vemuri Jian Baba C, Robust point set registration usinggaussian mixture models , IEEE transactions on pattern analysis andmachine intelligence (2011), no. 8, 1633–1645.[3] Xiang and Latecki Bai Longin Jan and Liu, Skeleton pruning by contourpartitioning with discrete curve evolution , IEEE transactions on patternanalysis and machine intelligence (2007), no. 3.[4] Xiang and Latecki Bai Longin Jan, Path similarity skeleton graphmatching , IEEE transactions on pattern analysis and machine intelligence (2008), no. 7, 1282–1292.[5] Andriy and Song Myronenko Xubo, Image registration by minimizationof residual complexity (2009).[6] Yi and Shen Wu Bin and Ling,

Online robust image alignment viaiterative convex optimization , 2012 IEEE Conference on ComputerVision and Pattern Recognition, 2012, pp. 1808–1814.[7] Jiayi and Zhao Ma Ji and Tian,

Robust point matching via vector ﬁeldconsensus. , IEEE Trans. image processing (2014), no. 4, 1706–1721.[8] Haibin and Jacobs Ling David W, Deformation invariant image match-ing , Computer Vision, 2005. ICCV 2005. Tenth IEEE InternationalConference on, 2005, pp. 1466–1473.[9] Andreas and Sormann Klaus Mario and Karner,

Segment-based stereomatching using belief propagation and a self-adapting dissimilaritymeasure , Pattern Recognition, 2006. ICPR 2006. 18th InternationalConference on, 2006, pp. 15–18.[10] JB Antoine and Viergever Maintz Max A,

A survey of medical imageregistration , Medical image analysis (1998), no. 1, 1–36.[11] Paul J and McKay Besl Neil D, Method for registration of 3-Dshapes , Sensor Fusion IV: Control Paradigms and Data Structures, 1992,pp. 586–607.[12] Rahul and Frahm Raguram Jan-Michael and Pollefeys,

A comparativeanalysis of RANSAC techniques leading to adaptive real-time randomsample consensus , European Conference on Computer Vision, 2008,pp. 500–513.[13] Alan L and Grzywacz Yuille Norberto M,

A computational theory forthe perception of coherent visual motion , Nature (1988), no. 6168,71.[14] Milan and Hlavac Sonka Vaclav and Boyle,

Image processing, analysis,and machine vision , Cengage Learning, 2014.[15] Gary KL and Cheng Tam Zhi-Quan and Lai,

Registration of 3D pointclouds and meshes: a survey from rigid to nonrigid. , IEEE transactionson visualization and computer graphics (2013), no. 7, 1199–1217.[16] Haili and Rangarajan Chui Anand, A new algorithm for non-rigid pointmatching , Computer Vision and Pattern Recognition, 2000. Proceedings.IEEE Conference on, 2000, pp. 44–51.[17] Fred L. Bookstein,

Principal warps: Thin-plate splines and the decom-position of deformations , IEEE Transactions on pattern analysis andmachine intelligence (1989), no. 6, 567–585.[18] Jiayi and Qiu Ma Weichao and Zhao, Robust L2E Estimation of Trans-formation for Non-Rigid Registration. , IEEE Trans. Signal Processing (2015), no. 5, 1115–1129.[19] Tao and Ling Wang Haibin, Path following with adaptive path esti-mation for graph matching , Thirtieth AAAI Conference on ArtiﬁcialIntelligence, 2016.[20] Xiaowei and Zhu Zhou Menglong and Daniilidis,

Multi-image matchingvia fast alternating minimization , Proceedings of the IEEE InternationalConference on Computer Vision, 2015, pp. 4032–4040.[21] Hang and Maji Su Subhransu and Kalogerakis,

Multi-view convolutionalneural networks for 3d shape recognition , Proceedings of the IEEEinternational conference on computer vision, 2015, pp. 945–953.[22] Abhishek and Grau Sharma Oliver and Fritz,

Vconv-dae: Deep volumet-ric shape learning without object labels , Computer Vision–ECCV 2016Workshops, 2016, pp. 236–250.[23] Daniel and Scherer Maturana Sebastian,

Voxnet: A 3d convolutionalneural network for real-time object recognition , Intelligent Robots andSystems (IROS), 2015 IEEE/RSJ International Conference on, 2015,pp. 922–928.[24] Nitika and Boyer Verma Edmond and Verbeek,

FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis , CVPR 2018-IEEEConference on Computer Vision & Pattern Recognition, 2018.[25] Jonathan and Boscaini Masci Davide and Bronstein,

Geodesic convo-lutional neural networks on riemannian manifolds , Proceedings of theIEEE international conference on computer vision workshops, 2015,pp. 37–45. [26] Charles R and Su Qi Hao and Mo,

Pointnet: Deep learning on pointsets for 3d classiﬁcation and segmentation , Proc. Computer Vision andPattern Recognition (CVPR), IEEE (2017), no. 2, 4.[27] Andy and Song Zeng Shuran and Nießner, , Computer Vision andPattern Recognition (CVPR), 2017 IEEE Conference on, 2017, pp. 199–208.[28] Andriy and Song Myronenko Xubo and Carreira-Perpin´an, Non-rigidpoint set registration: Coherent point drift , Advances in Neural Infor-mation Processing Systems, 2007, pp. 1009–1016.[29] Ignacio and Arandjelovic Rocco Relja and Sivic,

Convolutional neuralnetwork architecture for geometric matching , Proc. CVPR, 2017.[30] Guha and Zhao Balakrishnan Amy and Sabuncu,

An UnsupervisedLearning Model for Deformable Medical Image Registration , Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recogni-tion, 2018, pp. 9252–9260.[31] Bing and Wang Xu Naiyan and Chen,

Empirical evaluation of rectiﬁedactivations in convolutional network , arXiv preprint arXiv:1505.00853(2015).[32] Haoqiang and Su Fan Hao and Guibas,

A point set generation net-work for 3d object reconstruction from a single image , arXiv preprintarXiv:1612.00603 (2016).[33] Haoqiang and Su Fan Hao and Guibas,

A Point Set Generation Networkfor 3D Object Reconstruction from a Single Image. , CVPR, 2017, pp. 6.[34] Sergey and Szegedy Ioffe Christian,