Meta-PU: An Arbitrary-Scale Upsampling Network for Point Cloud
Shuquan Ye, Dongdong Chen, Songfang Han, Ziyu Wan, Jing Liao
JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Meta-PU: An Arbitrary-Scale UpsamplingNetwork for Point Cloud
Shuquan Ye, Dongdong Chen, Songfang Han, Ziyu Wan and Jing Liao
Abstract —Point cloud upsampling is vital for the quality of the mesh in three-dimensional reconstruction. Recent research on point cloudupsampling has achieved great success due to the development of deep learning. However, the existing methods regard point cloudupsampling of different scale factors as independent tasks. Thus, the methods need to train a specific model for each scale factor, whichis both inefficient and impractical for storage and computation in real applications. To address this limitation, in this work, we propose anovel method called “Meta-PU” to firstly support point cloud upsampling of arbitrary scale factors with a single model. In the Meta-PUmethod, besides the backbone network consisting of residual graph convolution (RGC) blocks, a meta-subnetwork is learned to adjust theweights of the RGC blocks dynamically, and a farthest sampling block is adopted to sample different numbers of points. Together, thesetwo blocks enable our Meta-PU to continuously upsample the point cloud with arbitrary scale factors by using only a single model. Inaddition, the experiments reveal that training on multiple scales simultaneously is beneficial to each other. Thus, Meta-PU evenoutperforms the existing methods trained for a specific scale factor only.
Index Terms —Point cloud, upsampling, meta-learning, deep learning. (cid:70)
NTRODUCTION P OINT clouds are the most fundamental and popular representa-tion for three-dimensional (3D) environment modeling. Whenreconstructing the 3D model of an object from the real world, acommon technique is to obtain the point cloud and then recoverthe mesh from it. However, a raw point cloud generated fromdepth cameras or reconstruction algorithms is usually sparse andnoisy due to the restrictions of hardware devices or the limitationsof algorithms, which leads to a low-quality mesh. To solve thisproblem, it is common to apply point cloud upsampling prior tomeshing, which takes a set of sparse points as input and generatesa denser set of points to reflect the underlying surface better.Conventional point cloud upsampling methods [1], [2], [3] areoptimization-based with various shape priors as constraints, suchas the local smoothness of the surface and the normal. These worksperform well for simple objects but are not able to handle complexand dedicated structures. Due to the success of deep learning, somedata-driven methods [4], [5], [6], [7] have emerged recently andachieved state-of-the-art performance by employing powerful deepneural networks to learn the upsampling process in an end-to-endway.However, all existing point cloud upsampling networks onlyconsider certain integer scale factors (e.g., 2x). They regard theupsampling of different scale factors as independent tasks. Thus, aspecific model for each scale factor has to be trained, limiting theuse of these methods in real-world scenarios where different scalefactors are needed to fit different densities of raw point clouds.Some works [6], [7] suggests to achieve larger scales through • Shuquan Ye, Ziyu Wan and Jing Liao are with the Department of ComputerScience, City University of Hong Kong, HK, China.E-mails: [email protected], [email protected] [email protected] • Dongdong Chen is with Microsoft Research.E-mail: [email protected] • Songfang Han is with University of California, San Diego, USA.E-mail: [email protected] • Jing Liao is the corresponding author.
Scale vector for
R=2.5
Predicted weightMeta-Sub-Net UpsamplingNet
Meta-PU
Up-sample Up-sample Down-sample ⌊ ⌋ Single-scale Models ⌊ ⌋ Fig. 1: Arbitrary-scale model Meta-PU vs. the single-scale modelsover the example scale R = 2 . . The existing single-scale modelsfirst need to scale to a larger integer scale (e.g., 4x), then use adownsample algorithm to achieve the noninteger scale of 2.5x.iterative upsampling (e.g., upsampling 4x by running the 2x modeltwice). However, this repeated computation is time-consuming, andupsampling of non-integer factors still cannot be achieved (e.g.,2.5x with single-scale models), as depicted in Fig. 1.In real-world scenarios, it is very common and necessary toupsample raw point clouds into various user-customized densitiesfor mesh reconstruction, point cloud processing, or other needs.Thus, an efficient method for upsampling arbitrary scale factorsis desired to solve the aforementioned drawbacks in the existingmethods. However, it is not easy for vanilla neural networks. Theirbehavior is fixed once trained because of the deterministic learnedweights, so it is not straightforward to let the network handle thearbitrary scale factor on the fly.Motivated by the development of meta-learning [8], [9] and thelatest image super-resolution method [10], we propose an efficientand novel network called “Meta-PU” for point upsampling ofarbitrary scale factors. By incorporating one extra cheap meta- a r X i v : . [ c s . G R ] F e b OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 subnetwork as the controller, Meta-PU can dynamically changeits behavior during runtime depending on the desired scale factor.Compared with storing the weights for individual scale factors,storing the meta-subnetwork is more convenient and flexible.Specifically, the backbone of Meta-PU is based on a graphconvolutional network (GCN), consisting of several residual graphconvolutional (RGC) blocks to extract the feature representationof each point as well as relationships to their nearest neighbors.And the meta-subnetwork is trained to generate weights for themeta-RGC block given the input of a scale factor. Then, the meta-convolution uses these weights to extract features that are adap-tively tailored to the scale factor. Following several RGC blocks,a farthest sampling block is further added to output an arbitrarynumber of points. In this way, different scale factors can be trainedsimultaneously with a single model. At the inference stage, whenusers specify an upsampling scale factor, the meta-subnetworkwill dynamically change the behavior of the meta-RGC block byadapting its weights and outputs the corresponding upsamplingresults.To demonstrate the effectiveness and flexibility of our method,we compare it with several strong baseline methods. The compari-son shows that our method can even achieve SOTA performancesfor specific single scale factors while supporting arbitrary-scaleupsampling for the first time. In other words, our approach isboth stronger and more flexible than the SOTA approaches. Tobetter understand the underlying working principle and broaderapplications, we further provide a comprehensive analysis fromdifferent perspectives.In summary, our contribution is three-fold: • We propose the first point cloud upsampling networkthat supports arbitrary scale factors (including nonintegerfactors), via a meta-learning approach. • We show that jointly training multiple scale factors with onemodel improves performance. Our arbitrary-scale modeleven achieves better results at each specific scale than thesingle-scale counterpart. • We evaluate our method on multiple benchmark datasetsand demonstrate that Meta-PU advances state-of-the-artperformance.
ELATED W ORK
Optimization-based upsampling.
Point cloud upsampling isformulated as an optimization problem in early work. A pioneeringsolution proposed by Alexa et al. [1] constructs a Voronoi diagramon the surface and then inserts points at the vertices. Lipman etal. [11] designed a novel locally optimal projection operator forpoint resampling and surface reconstruction based on L median,which is robust to noise outliers. Later Huang et al. [2] improvedthe locally optimal projection operator to enable edge-aware pointset upsampling. Wu et al. [3] employed a joint optimization methodfor the inner points and surface points defined in their new pointset representation. However, most of these methods have a strong apriori assumption (e.g., a reasonable normal estimation or a smoothsurface in the local geometry). Thus, they may easily suffer fromcomplex and massive point cloud data. Deep-learning-based upsampling.
Recently, deep learning hasbecome a powerful tool for extracting features directly frompoint cloud data in a data-driven way. Qi et al. firstly proposePointNet [12] and PointNet++ [13] for extracting multi-level features from point sets. Based on these flexible feature extractors,deep neural networks have been applied to many point cloud tasks,such as those in [14], [15], [16]. As for point cloud upsampling,Yu et al. [4] presented a point cloud upsampling neural networkoperating on the patch level and made it possible to directly input ahigh-resolution point cloud. Then, Yu et al. developed EC-Net [17]to improve the quality of the upsampled point cloud using an edge-aware joint learning strategy. Wang et al. proposed a progressivepoint set upsampling network [5] to suppress noise further andpreserve the details of the upsampled point cloud. Moreover,different frameworks, such as the generative adversarial network(GAN) [18] and the graph convolutional network(GCN) [19],have attracted researchers’ attention for handling point cloudupsampling. Li et al. proposed the PU-GAN [6] by formulatinga GAN framework to obtain more uniformly distributed pointcloud results. Wu et al. proposed AR-GCN [7] to make the firstattempt to model point cloud upsampling into a GCN. However,these networks are only designed for upsampling a fixed scalefactor. When different upsampling scales are required in practicalapplications, multiple models have to be retrained. Unlike theirmethods, Meta-PU supports upsampling point clouds for arbitraryscale factors, by employing meta-learning to predict weights of thenetwork and dynamically change behavior for each scale factor.
Meta-learning.
Meta-learning, or learning to learn, refers tolearning by observing the performance of different machinelearning methods on various learning tasks. It is normally a two-level model: a meta-level model performed across tasks, and abase-level model acting within each task. The early meta-learningapproach is primarily used in few-shot/zero-shot learning, andtransfer learning [20], [21]. Recent works have also applied meta-learning to various tasks and achieved state-of-the-art results inobject detection [22], instance segmentation [23], image super-resolution [10], image smoothing [8], [9], [24], network pruning[25], etc. A more comprehensive survey of meta-learning can befound in [26]. Among these works, meta-SR [10], which learns theweights of the network for arbitrary-scale image super-resolution,is closely related to ours. However, it cannot be applied to ourtask. The main reason is that the target of meta-SR is imageswith a regular grid structure, whereas our target includes muchmore challenging irregular and orderless point clouds. For regulargrid-based images, since the correspondence between each outputpixel and the corresponding input pixel is pre-determined, meta-SRcan directly use relative offset information to regress the localupsampling weights. However, no such correspondence exists forpoint clouds. Therefore, we resort to using the meta-subnetworktogether with the farthest sampling block. The meta-subnetwork isresponsible for adaptively tailoring the point features of a specificscale factor by dynamically adjusting the weights of the RGC block,while the sampling block is responsible for sampling a particularnumber of points.
ETHOD
In this section, we define the task of arbitrary-scale point cloudupsampling. Then we introduce the proposed Meta-PU in detail.
Given a sparse and unordered point set X = { p i } ni =1 of n points,and with a scale factor R , the task of arbitrary-scale point cloudupsampling is to generate a dense point set Y = { p i } Ni =1 of OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 P o i n t C NN Meta-RGC-block R G C - b l o c k s *20 U npoo li ng S a m p li ng L rec R G C - b l o c k X feature c×n feature c×n Meta-Sub-Net w Meta-conv L uni L rep Predicted weight ⌊ R*n ⌋ Scale vector (max(0,R-i),R)i=1... ⌈ R ⌉ Fig. 2: Overview of Meta-PU. Given a sparse input point cloud X of n points and a scale factor R , Meta-PU generates denser pointcloud Y (cid:48) with (cid:98) R × n (cid:99) points. A compound loss function is employed to encourage Y (cid:48) to lie uniformly on the underlying surface oftarget Y . The pink box is the core part of Meta-PU. The meta-subnetwork takes scale factor R as input and outputs the weight tensor v weight for convolutional kernels in the meta-RGC block, to adapt the feature extraction to different upscales. N = (cid:98) R × n (cid:99) points. It is worth noting that R is not necessarilyan integer, and theoretically, N can be any positive integer greaterthan n . The output Y does not necessarily include points in X . Inaddition, X may not be uniformly distributed. In order to meet theneeds of practical applications, we need the upsampled point cloudto satisfy the following two constraints. Firstly, each point of Y lies on the underlying geometry surface described by X . Secondly,the distribution of output points should be smooth and uniform, forany scale factor R or input point number n . Overview.
The backbone upsampling network contains five basicmodules distinguished by different colors in Fig. 2. The inputpoint cloud first goes through a point convolutional neural network(CNN) and several RGC blocks to extract features for each centroidand its neighbors. Among these RGC blocks, the meta-RGC blockis special. The meta-RGC block weights are dynamically generatedby a meta-sub-network given the input of R . Thus the featuresextracted by this meta-RGC block are tailored to the given scalefactor. After the RGC blocks, an unpooling layer is followed tooutput (cid:98) R max × n (cid:99) points, where R max denotes the maximumscale factor supported by our network, and R max = 16 by default.Afterward, the farthest sampling block is adopted to sample N points from (cid:98) R max × n (cid:99) points as the final output, which isconstrained by a compound loss function. In the following section,we elaborate on the detailed structure of each block in Meta-PU,and the training loss. Point CNN.
The point CNN is a simple structure on the spatialdomain to extract features from the input point cloud X . In detail,for each point p ∈ X with shape × , we first group its k nearestneighbors with shape k × , and then feed them into a series ofpoint-wise convolutions ( k × c ) followed by a max-pooling layerto obtain × c features, where c is the channel number of pointcloud feature. Thus, the output feature F out is a tensor of shape n × c . Recursively applied convolution reaches a wider receptive field representing more information, whereas the maximum poolinglayer aggregates information from all points in the previous layer.In our implementation, we set k = 8 , and c = 128 and the numberof convolutional layers is . RGC Block.
As shown in Fig. 3a, the RGC block contains severalgraph convolutional layers and residual skip-connections, whichis inspired from [7]. It takes the feature tensor F in as input andoutputs F out of the same shape n × c as F in .The graph convolution in the RGC block is defined on agraph G = (
V, ε ) , where V denotes the node set and ε denotesthe corresponding adjacency matrix. The graph convolution isformulated as follows: f pout = ω ∗ f pin + ω ∗ Σ q ∈ N ( p ) f qin , ∀ p ∈ V (1)where f pin denotes the input feature of vertex p , and f pout representsthe output feature of vertex p after graph convolution, where ω is the learn-able parameters and ∗ denotes the point-wiseconvolutional operation.The core idea of the RGC block is to separately operate theconvolution on the central-point feature and neighbor feature, asillustrated in Fig. 3b. For the neighbor features, they are groupedwith the k nearest neighbors of the input point cloud x and thengo through a × graph convolution. The central-point featuresare convolved separately from those of the neighbors and thenare concatenated with the neighbors’ features. Moreover, residualskip-connections are introduced to address the vanishing gradientand slow convergence problems. In our implementation, we set k = 8 , c = 128 , and a total of 22 RGC blocks are used. Amongthem, the second one is a special meta-RGC block, which isdescribed in detail next. Meta-RGC Block and Meta-subnetwork.
To solve the arbitrary-scale point cloud upsampling problem with a single model, wepropose a meta-RGC block, which is the core part of Meta-PU.The meta-RGC block is similar to a normal RGC block, but the
OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4
ReLU F in c×n c×n×k Concat Mean
Conv c×n
Group
Conv F out c×nc×n×(k+1) (a) RGC block feature Group feature
ReLU Neighbormeta-convCentermeta-conv Concat Mean (b) Meta-RGC block
ReLU F in c×n c×n×k Concat Mean
Conv c×n
Group
Conv (R max ×3)×n(R max ×3)×n×(k+1) Reshape F out max ×n) Unpooling Block (c) Unpooling block
Fig. 3: Structure of RGC block (a), meta-RGC block (b) and unpool-ing block (c). Both RGC and meta-RGC blocks convolve centroidfeatures and neighborhood features, respectively. In the meta-RGCblock, the weights of its convolutional layers are dynamicallypredicted based on the scale factor R . The unpooling block followsthe last RGC block.graph convolutional weights are dynamically predicted, dependingon the given scale factor R . Instead of feeding R directly into themeta-RGC block, we create a scale vector (cid:101) R = { max (0 , R − i ) , R } i =1 ... (cid:100) R (cid:101) ) as the input and fill the rest with {− , − } toachieve the size ∗ R max . The philosophy behind this design isindeed inspired by meta-SR [10]. More specifically, because eachinput point is essentially transformed into a group of output points, { max (0 , R − i ) , R } can serve as the location identifier to guidethe point processing network to differentiate the new i − th pointfrom other points generated by the same input seed point.The meta convolution is formulated as follows: f pout = ϕ (cid:16) (cid:101) R ; θ (cid:17) ∗ f pin + ϕ (cid:16) (cid:101) R ; θ (cid:17) ∗ Σ q ∈ N ( p ) f qin , ∀ p ∈ V (2)where the convolution weights are predicted by the meta-subnet-work ϕ ( . ) taking the scale vector (cid:101) R as input. Please note we havetwo branches of meta-convolution, as shown in Fig. 3b. One branchis for the feature of the center point p , and the other is for the featureof the neighbors defined by the adjacency matrix ε . Since there isno pre-defined adjacency matrix ε for point clouds, we define it as N ( p ) , the k nearest neighbors of p . The convolution weights ofthese two branches are generated by two meta-subnetworks withthe parameters θ i respectively.Each meta-subnetwork for meta-convolution comprises fivefully-connected (FC) layers [27] and several activation layers asshown in Fig.4. In the forward pass, the first FC layer takes the scale vector created from R as input, and obtains a vector of c hidden entries. After the activation function, the second FC layerproduces output of the same size as the input. Following theactivation function, the input of the third FC layer is the c hidden -entry encoding, and its output has length c in × c out × l × l . Next,the fourth FC layer outputs a vector w with the same shape asits input. Unlike the previous four concatenated layers, the lastFC layer serves as a skip-connection that obtains output w skip of shape c in × c out × l × l directly from ∗ R max . The twooutputs w , w skip are added and then reshaped to ( c in , c out , l, l ) as the weight matrix w for meta-convolution. We set c out = c in = 128 and c hidden = 128 . The l represents the kernel size ofthe convolution, which is set to in our implementation. In thebackward pass, instead of directly updating the weight matrix ofthe convolution, we calculate the gradients of the meta-subnetwork,with respect to the weights of the FC layers. The gradient of themeta-subnetwork can be naturally calculated by the Chain Rule tobe trained end-to-end.The meta-RGC block with dynamic weights predicted by themeta-subnetwork is necessary for the arbitrary-scale upsamplingtask because the upsampled point cloud iR -th to ( i + 1) R -th pointsare generated directly based on the features of the i -th input pointand its nearest neighbors extracted via RGC blocks. The pointlocations in the output of different scale factors have to be different,to ensure that the uniformity of the upsampled points can coverthe underlying surface. Therefore, the embedding features mustbe adaptively adjusted according to the scale factors. Thereforeadaptive adjustment of the embedding features according to thescale factor is necessary. This adjustment is much better than mereupsampling to R max times and then performing the downsample.The experiments in Section 4.5 are designed to prove this. The unpooling block takes point cloud X and correspondingfeatures F in as input. It is an RGC-based structure, while theoutput channels of the convolutional layers are set to R max × .Specifically, for feature F in of shape n × c , it is transformedto a tensor of size n × ( R max × , subsequently reshaped to n × R max × , denoted as T out . As a residual block, similar tothe residual connection of the input and output features in the RGCblock, we introduce a skip connection between points. Thus, thetensor T out is then point-wisely added to X to produce the output Y (cid:48) max of shape n × R max × . Note that the “add” operationnaturally expands x to R max copies in a broadcast manner. The farthest sampling block performs a farthest sampling strategyto retain Y (cid:48) with n × R points from Y (cid:48) max with (cid:98) R max × n (cid:99) . Theadvantages are two-fold. First, the farthest sampling can sample anarbitrary number of points from the input point set, which helpsobtain the required number of points as output. Second, since thefarthest sampling iteratively constructs a point set with the farthestpoint-wise distance according to the Euclidean distance from aglobal perspective, this step further enhances the uniformity of thepoint set distribution. For end-to-end training of Meta-PU, we adopt a compound losswith both reconstruction terms L rec and uniform terms L uni , L rep : L = λ rec L rec + λ uni L uni + λ rep L rep (3)The latter two terms aim at encouraging the uniformity of thegenerated point cloud and improving the visual quality. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5
Meta-Sub-NetworkScale vectors Outnon-linear w0 Outskip wskip F C ( R m ax , n o u t) Meta-ConvLeakeyReLUfeaturefeaturePredicted weight w L e a k e y R e L U L e a k e y R e L U L e a k e y R e L U F C ( R m ax , ) F C ( n o u t , n o u t) F C ( , ) F C ( , n o u t ) Fig. 4: Structure of the meta-subnetwork. The meta-subnetwork inside the pink box predicts weights for convolutional layers in themeta-RGC block.
Repulsion loss [4] L rep is represented as follows: L rep = N (cid:88) i =0 (cid:88) i (cid:48) ∈ K ( i ) η ( (cid:107) p i (cid:48) − p i (cid:107) ) w ( (cid:107) p i (cid:48) − p i (cid:107) ) (4)where N is the number of output points, K ( i ) is the index setof the k-nearest neighbors of point p i in output point cloud Y (cid:48) , η ( r ) = − r is the repulsion term, and w ( r ) = e − r /h . Uniform loss
The term [6] L uni comprises two parts: U imbalance accounting for global uniformity, and U clutter accounting for thelocal uniformity. L uni = M (cid:88) j =1 U imbalance ( S j ) · U clutter ( S j ) (5)where S j , j = 1 ..M refers to the ball queried point subsets withradius r d and centered at M seed points farthest sampled from Y (cid:48) . U imbalance ( S j ) = ( | S j | − ˆ n ) ˆ n where ˆ n = ˆ N × r d , referring to the expected number of points in S j . Note that, the imbalance term is not differentiable, which actsas a weight for the following clutter term. U clutter ( S j ) = | S j | (cid:88) k =1 (cid:16) d j,k − ˆ d (cid:17) ˆ d where d j,k is the point-to-neighbor distance of the k -th point in S j , while ˆ d = (cid:114) πr d | S j |√ denotes the expected distance. Un-biased Sinkhorn divergences [28] are proposed by us as thereconstruction loss, to encourage the distribution of generatedpoints to lie on the underlying mesh surface. It is the interpolationbetween the Wasserstein distance and kernel distance. The Sinkhorndivergences between output Y (cid:48) and the groundtruth Y can beformulated as follows: L rec = S ε ( Y, Y (cid:48) ) = OT ε ( Y, Y (cid:48) ) −
12 OT ε ( Y, Y ) −
12 OT ε ( Y (cid:48) , Y (cid:48) ) (6)where ε is the regularization parameter, and OT ε ( Y, Y (cid:48) ) def. = min π = Y,π = Y (cid:48) (cid:90) X Cd π + ε KL( π | Y ⊗ Y (cid:48) ) with a cost function on the feature space X ⊂ R D of dimension D as follows: C ( x, y ) = 12 (cid:107) x − y (cid:107) (7)and where optimization is performed over the coupling measures π ∈ M +1 (cid:0) X (cid:1) as ( π , π ) denotes the π ’s two marginal. In the training process of most existing single-scale point cloudupsampling methods, each model is trained with one scale factor.However, because the scale factor varies in our arbitrary-scaleupsampling task, we need to design a variable-scale training schemeto train all factors jointly. We first sampled all factors from therange of . to R max with a stride of . , and put them in set S R . For each epoch, a scale factor, say R , is randomly sampledfrom S R , and this factor is shared in a batch. To avoid overfitting,we also perform a series of data augmentation: rotation, randomscaling, shifting, jittering, and perturbation with low probability. XPERIMENTS
Dataset.
For training and testing, we utilize the same datasetadopted by PU-Net [4] and AR-GCN [7]. This dataset contains different models from the Visionair repository. Following theprotocol in the above two works, models are used for trainingand the rest models are used for testing.For training, 100 patches are extracted from each model, thuswe have a total of , patches. We uniformly sample N pointsusing Poisson disk sampling from each patch as the ground truth,and non-uniformly sample n points from the ground truth asinput, where n = (cid:4) n max × R (cid:5) , and N = R × (cid:4) n max × R (cid:5) ,corresponding to the scale factor R . Moreover, we set n max =4096 as the maximum number of points in our training. For testing,we use the whole model instead of the patch. The sampling processof the ground truth and input is similar to that in training. Butconstrained by the GPU memory limit, we set different numbers ofinput points for different scale factors, i.e. for R < = 4 , for < R < = 6 , for < R < = 12 , and for R > . Metrics.
For a fair comparison, we employ several different popularmetrics:
Chamfer Distance (CD) and
Earth Mover Distance(EMD) defined on the Euclidean distance are to measure thedifference between predicted points Y (cid:48) and ground-truth pointcloud Y . The CD sums the square of the distance between each OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6
TABLE 1: Experiments of quantitative comparisons. Single-scale models (including AR-GCN and PU-GAN) trained with each specificscale factor (top two rows) vs. the naive approach of arbitrary-scale upsampling (rows 3 to 5) vs. our full model (last row). The NUCscores are tested with p = 0 . . Methods \ Scales 2x 4xCD EMD F-score NUC mean std CD EMD F-score NUC mean stdAR-GCN - - - - - - 0.0086 0.018 70.09% 0.339 0.0029 0.0033PU-GAN 0.016 0.0090 32.17% 0.249 0.012 0.015 0.0097 0.016 69.75% 0.202 0.0030 0.0031AR-GCN(x16) 0.015 0.023 30.14% 0.307 0.0089 0.014 0.012 0.041 45.34% 0.256 0.0081 0.0096+random-samplingAR-GCN(x16) 0.014 0.012 33.52% 0.227 0.0088 0.011 0.011 0.018 52.67% 0.318 0.0072 0.0092+farthest-samplingAR-GCN(x16) 0.015 0.013 36.98% 0.273 0.0067 0.0082 0.013 0.013 54.05% 0.288 0.0066 0.0080+disk-samplingours
Methods \ Scales 6x 9x 16xCD EMD F-score NUC mean std CD EMD F-score NUC mean std CD EMD F-score NUC mean stdAR-GCN - - - - - -
TABLE 2: Quantitative comparisons with the EAR. The NUCscores are tested with p = 0 . . Methods CD ↓ EMD ↓ F-score ↑ NUC ↓ mean ↓ std ↓ EAR (2x) 0.0113 0.0214 48.07% 0.747 0.0048 0.0113Ours (2x)
EAR (4x) 0.0112 0.0176 51.26% 0.478 0.0074 0.0137Ours (4x)
EAR (6x) 0.0120 0.0184 52.26% 0.421 0.0085 0.0145Ours (6x)
EAR (9x) 0.0119 0.0174 52.93% 0.442 0.0089 0.0140Ours (9x) point and the nearest point in the other point set, then calculates theaverage for each point set. The EMD measures the minimum costof turning one of the point sets into the other. For these two metrics,the lower, the better. We also report the
F-score between Y (cid:48) and Y that defines the point cloud super-resolution as a classificationproblem as [7]. For this metric, larger is better. We employ the normalized uniformity coefficient (NUC) [4] to evaluate theuniformity of Y (cid:48) by directly comparing the output point cloud Y (cid:48) with corresponding ground-truth meshes, and deviation meanand std to measure the difference between the output point cloudand ground-truth mesh. For these two metrics, smaller is better. We train the network for 60 epochs with a batch size of 18. Adamis adopted as the loss optimizer. The learning rate is initially setto 0.001 for FC layers and 0.0001 for convolutions and otherparameters, which is decayed with a cosine annealing scheduler to e − . Parameters λ rec , λ uni and λ rep for the joint loss functionare set to , . and . respectively. Generally, the trainingtakes less than seven hours on two Titan-XP GPUs.Theoretically,MetaPU supports any large scale, but we set the maximum scale to16 due to the limitations of the computing resources and practicalneeds. Fig. 5: Ablation on the meta-RGC block. Meta-PU is applied toupsample the point clouds to 2x, but the weights of its meta-RGCblock are generated with a different input scale factor R . R = 2 achieves the best performance on both F-score and CD, whichindicates our meta-RGC block adapts the convolutional weightappropriately to different scale factors. In this experiment, we compare Meta-PU with state-of-the-artsingle-scale upsampling methods, including PU-GAN [6] and AR-GCN [7], to upsample the sparse point cloud with scale factors R ∈ [2 , , , , . Their models are trained with the author-released code, and all settings are the same as stated in theirpapers. Since they are single-scale upsampling methods, for eachscale factor, an individual model is trained. Due to the limitationsof the two-stage upsampling strategy, AR-GCN can only be trainedwith the factors , , , whereas PU-GAN can be trained with allfour factors. Their performance is reported in the first two rows ofTable 1. We surprisingly observe that our arbitrary-upscale modeleven outperforms their single-scale models with most scale factors.Particularly, our model performs significantly better on the F-score, OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
TABLE 3: Quantitative comparisons with MPU. Our method obtains superior results under most metrics.
Method CD EMD F-score NUC with different p Deviation(1e-2) Time0.2% 0.4% 0.6% 0.8% 1.0% mean stdMPU(2x)
MPU(4x) 0.0086 0.012 73.16% 0.321 0.282 0.265 0.256 0.249
MPU(16x)
TABLE 4: Comparison of the inference time.
Method AR-GCN +Disk-sampling PU-GAN +Disk-sampling EAR oursTime(s) 10.28 10.06 351.10
NUC, mean, and std metrics than other models, and is more stableon all scales. This may be because multiple joint training tasks ofdifferent scales can benefit each other, thus improving performance.In addition, Meta-PU needs to train only once for all testings, whileothers need to train multiple models, which is very inefficient.
A naive approach to achieve arbitrary-scale upsampling is to firstuse a state-of-the-art single-scale model to upsample the cloud pointto a large scale, and then downsample it to a specific smaller scale.We compare our method with this naive approach. Specifically, wechoose AR-GCN [7] to upsample point clouds to 16x and thendownsample them to 2x,4x,6x and 9x with the random sampling,disk sampling, and farthest sampling algorithms. The results arereported in 3-5th rows of Table 1. We can see that random samplinggets the worst scores because it non-uniformly downsamples thepoints. In comparison, the more advanced sampling algorithms,including disk sampling and farthest sampling, perform betterby considering uniformity. Our method is still superior to all ofthem because the result of a smaller scale factor in our methodis not simply a subset of the large-factor one. In fact, Meta-PUcan adaptively adjust the location of the output points to fit theunderlying surface better and maintain uniformity according todifferent scale factors. This will be analyzed by the ablation studyof the meta-RGC block in the next subsection. Moreover, comparedto the strongest baseline (AR-GCN+disk-sampling), ours is 120times faster (Table 4), because this advanced upsampling algorithmrequiring mesh reconstruction is slow.We also compare our method with the state-of-the-artoptimization-based method EAR [29], which is also applicableto variable scales. The results of scale 2,4,6,9,16 are provided inTable 2. It could be found that our method yields superior resultsunder all metrics.Further, we compare Meta-PU with the state-of-the-art multi-step upsample method MPU [5] that recursively upsamples a pointset, which is also applicable to scales of a power of 2,( e.g., 2,4,16).The results of scales 2,4,16 are provided in Table 3. It can be foundthat our method obtains superior results under most metrics. Inaddition, we provide a comparison of inference times. The runningtime of our method is much less at all scales, demonstrating thatour method is more efficient than the recursive approach. 6 F D O H ) V F R U H I V F R U H 6 F D O H / R V V &