Point Cloud Super Resolution with Adversarial Residual Graph Networks
PPoint Cloud Super Resolution with Adversarial Residual Graph Networks
Huikai Wu Junge Zhang Kaiqi HuangInstitute of Automation, Chinese Academy of SciencesUniversity of Chinese Academy of Sciences { huikai.wu, jgzhang, kaiqi.huang } @nlpr.ia.ac.cn Abstract
Point cloud super-resolution is a fundamental problemfor 3D reconstruction and 3D data understanding. It takesa low-resolution (LR) point cloud as input and generatesa high-resolution (HR) point cloud with rich details. Inthis paper, we present a data-driven method for point cloudsuper-resolution based on graph networks and adversar-ial losses. The key idea of the proposed network is to ex-ploit the local similarity of point cloud and the analogy be-tween LR input and HR output. For the former, we designa deep network with graph convolution. For the latter, wepropose to add residual connections into graph convolutionand introduce a skip connection between input and output.The proposed network is trained with a novel loss function,which combines Chamfer Distance (CD) and graph adver-sarial loss. Such a loss function captures the characteris-tics of HR point cloud automatically without manual design.We conduct a series of experiments to evaluate our methodand validate the superiority over other methods. Resultsshow that the proposed method achieves the state-of-the-art performance and have a good generalization ability tounseen data. Code is available at https://github.com/wuhuikai/PointCloudSuperResolution .
1. Introduction
When modeling an object from the real world for 3Dprinting or animation, a common way is to first obtain thepoint cloud with depth scanning devices or 3D reconstruc-tion algorithms [7, 11] and then recover the mesh from thepoint cloud [3]. However, the captured point cloud is usu-ally sparse and noisy due to the restrictions of devices orthe limitations of algorithms, which leads to a low-qualitymesh.The key to improving the quality of the recovered meshis point cloud super-resolution, which takes a LR pointcloud as input and generates a HR point cloud with rich de-tails and few noisy points, as shown in Figure 1. Most exist-ing methods are optimization based without learning from (a) Input (b) GT (c) PU-Net [37] (d) Ours
Figure 1:
Point Cloud Super Resolution. (a) is the inputLR point cloud with sparse distribution. (b) is the corre-sponding HR point cloud with dense distribution. (c) and(d) are the HR point cloud generated by PU-Net [37] andour method respectively. Ours is sharper at edges with fewernoisy points. Best viewed in color.data, which have strong assumptions about the underlyingsurface of the HR point cloud [12, 13]. As for data-drivenmethods, few prior works study deep learning on this prob-lem. [37] propose PU-Net for point cloud super-resolution,which is a pioneering work. By employing deep neuralnetworks, it outperforms many traditional methods such asEAR [13] and achieves the state-of-the-art performance.In this paper, we aim to advance the performance of pointcloud super-resolution by overcoming the defects of PU-Net. The first problem is that PU-Net directly regresses thepoint coordinates without exploiting the similarity betweenLR and HR point cloud, which makes it hard to train. Thesecond problem is that PU-Net proposes a complicated lossfunction with a strong assumption on the uniform distribu-tion of HR point cloud. Manually designed loss functionstend to overfit human priors, which fail to capture manyother properties of HR point cloud, such as continuity.Recent work [16] in image super-resolution shows thatpredicting the residual between the LR and HR image isa more desirable way to achieve better accuracy. Thus, tosolve the first problem, we propose to introduce residualconnections into graph convolution networks (GCNs) [5]1 a r X i v : . [ c s . G R ] A ug nd add a skip connection between the input layer and theoutput layer. Employing GCNs to process point cloud is notnew [38]. However, the GCN in our method is unique in twoaspects compared to that in [38]: (1) The architecture of ourGCN is designed for generating point cloud while that in[38] aims at aggregating information for classification. (2)We propose an un-pooling layer for the GCN to upsamplethe input point cloud.To solve the second problem, we design a graph adver-sarial loss based on LS-GAN [23]. The proposed loss func-tion is more expressive than manually designed ones, whichcan capture the characteristics of HR point cloud automat-ically. Pan et al . also introduce adversarial loss into graphnetworks [25]. However, they focus on learning the distri-bution of graph embeddings. Thus, a multi-layer percep-tron is employed as the discriminator to process the inputvector. Differently, we aim at distinguishing real and fakepoint clouds. To achieve this, we propose a GCN as thediscriminator to process the generated point cloud, which issignificantly different from [25].In this way, we propose a novel method for point cloudsuper-resolution, named Adversarial Residual Graph Con-volution Network (AR-GCN). Experiments show that theproposed method achieves the state-of-the-art performanceon both seen dataset [37] and unseen dataset (SHREC15).The contributions of our method are three-folds. First,we propose a novel architecture for point cloud super-resolution. Second, we introduce the graph adversarial lossto replace manually designed loss functions. Third, we ad-vance the state-of-the-art performance on both seen and un-seen datasets.
2. Related Work
Point cloud super-resolution is formulated as an opti-mization problem in most earlier works. To upsample apoint cloud, [2] first compute the Voronoi diagram on themoving least squares surface and then add points at the ver-tices of this diagram. [22] present a locally optimal pro-jection operator for surface approximation based on L me-dian, which is parameter-free and robust to noisy points.These methods have a strong assumption on the smooth-ness of the underlying surface, which tend to have vagueedges. Thus, [13] introduce an edge-aware point cloud up-sampling method, which first samples away from the edgesand then progressively samples the point cloud while ap-proaching the edge singularities.However, all these methods have strong assumptionsabout the underlying surface based on human priors. Toexploit the massive 3D data, [37] propose a data-drivenmethod that first learns multi-level features per point withPointNet++ [27] and then expands the point set via a multi- branch convolution. Through end-to-end learning, thismethod outperforms the optimization based methods onmultiple datasets, which achieves the state-of-the-art per-formance. There is a rising interest in 3D data processing recently.Most existing works transform 3D data into volumetricgrids, which are then processed by 3D CNNs [30, 24,36, 28]. Particularly, [30] propose to upsample 3D ob-jects in voxel space, which is similar to point cloud super-resolution.3D CNNs are memory and time consuming. Thus, fol-lowing works propose to process point cloud directly in-stead of the volumetric grid [26, 27, 20, 32, 19, 14]. [26]propose a point-wise network for 3D object classificationand segmentation. Successively, [27] introduce a hierarchi-cal feature learning architecture to capture local and globalcontext. [20] introduce a convolution operator named X-conv that is capable of leveraging spatially-local correlationfor point cloud classification.Besides 3D scene understanding, multiple works focuson employing deep learning for single image 3D reconstruc-tion [7, 34, 33, 15]. [7] present a conditional shape samplerto generate multiple plausible point clouds from a single im-age. Differently, [34] propose a graph CNN for producingthe 3D triangular mesh from a single color image.
Super-resolution for 2D images is a well-studied prob-lem, which is closely related to point cloud super-resolution. Modern approaches are usually built on deeplearning in a data-driven manner [6, 35, 18, 39, 9, 10]. [6]first upsample the low-resolution image with bicubic inter-polation and then use a shallow CNN to recover the detailsand textures. Instead of manually designed interpolation,[35] propose to jointly learn the interpolation and detail re-covery. To generate realistic images, [18] present a gener-ative adversarial network, which pushes the generated im-ages close to natural images.
3. Method
In this section, we first define point cloud super-resolution formally and then introduce our method AR-GCN in detail. Our method is a novel approach that con-tains three major components: the adaptive adversarial lossfunction L G , the residual GCN G , and the graph discrim-inator D . As shown in Figure 2, the LR point cloud x isdirectly fed into G to generate the HR output ˆ y . Then, ˆ y issent into D to produce L G , while another loss L cd is calcu-lated based on ˆ y and the ground truth y . e a t u r e N e t R e s i du a l G r a ph C o n v R e s i du a l G r a ph C o n v Features
LR: 𝑥 𝑦 F e a t u r e N e t U np oo li n g U np oo li n g P oo li n g R e s i du a l G r a ph C o n v P oo li n g R e s i du a l G r a ph C o n v P oo li n g R e s i du a l G r a ph C o n v R e a l / F a k e ? GT: 𝑦 𝐿 &’ GD Figure 2:
Framework Overview.
The proposed AR-GCN consists of a generator G and a discriminator D . G is a residualGCN that upsamples the input point cloud progressively with the upsampling ratio × . D is also a residual GCN, whichlearns to distinguish fake HR point cloud from the real one. Formally, given a point cloud x with shape n × , thegoal of point cloud super-resolution is to generate a pointcloud ˆ y with shape N × ( N = γn, γ > ). Each point of ˆ y lies on the underlying surface described by x , as shown inFigure 1. As shown in Figure 2, our method AR-GCN consists oftwo networks, the generator G and the discriminator D . G aims to generate the HR point cloud by upsampling the LRinput progressively, while D is responsible for distinguish-ing the fake HR point cloud from the real one. To train G and D simultaneously, we propose a joint loss function asshown in Equation 1: L ( x, y ) = λL cd ( G ( x ) , y ) + L G ( G ( x )) , (1)where λ controls the balance between L cd and L G . L cd measures the distance between y and ˆ y , which issimilar to L loss in image super-resolution. As shown inEquation 2: L cd (ˆ y, y ) = Σ p ∈ y min q ∈ ˆ y || p − q || , (2)which is a variant of Chamfer Distance. The original Cham-fer Distance consists of two parts: L cd and L ˆ cd , where L ˆ cd is symmetric with L cd and defined as Equation 3: L ˆ cd (ˆ y, y ) = Σ q ∈ ˆ y min p ∈ y || p − q || . (3) L ˆ cd encourages ˆ y to be identical to the LR input, whichleads to duplication points in the output point cloud. Thus,we remove L ˆ cd and only employ L cd as our loss function. L cd measures the point-wise distance between y and ˆ y ,which ignores high-order properties defined by a clusterof points, such as continuity. Traditional methods usuallymanually design a complex function as the loss, which isinefficient and has strong assumptions about the underlyingsurface. Alternatively, we propose a loss function L G that isdefined by a network and learned from data automatically.Concretely, L G is a graph adversarial loss that is inspiredby generative adversarial networks (GANs) [8]. In this pa-per, we employ LS-GAN [23] as the adversarial loss for itssimplicity and effectiveness. L G is defined as follows: L G (ˆ y ) = || − D (ˆ y ) || , (4)where D is the discriminator that aims to distinguish realand fake HR point cloud by minimizing the following loss: L D (ˆ y, y ) = 12 || D (ˆ y ) || + 12 || − D ( y ) || . (5) The generator G is built on the Graph Convolution Net-work (GCN) [4], which aims to progressively upsample theLR point cloud. It consists of three building blocks, namelyresidual graph convolution block, unpooling block and fea-ture net, as shown in Figure 2. PU-Net employs PointNet++ to generate HR pointcloud, which treats the central point and the neighbor pointsequally. This limits the learning ability of the network. Al-ternatively, we build our method on graph convolution [4],as shown in Figure 3. e L U G - c o n v ! " $ " + R e L U R e L U G - c o n v + $ %&' ! %&' Figure 3:
Residual Graph Convolution Block. x in is usedfor querying the k nearest neighbors. x out is the same as x in .The core of graph convolution, G-conv, is defined on agraph G=( υ , ε ) and calculated as follows, f pl +1 = w f pl + w Σ q ∈ N ( p ) f ql , ∀ p ∈ υ, (6)where w is the learn-able parameters and f pl represents thefeature of vertex p at layer l . N ( p ) is the vertices that con-nect to p as defined by the adjacency matrix ε . However,there’s no predefined adjacency matrix for a point cloud. Tosolve this problem, we define N ( p ) as the k nearest neigh-bors of p in Euclidean space, of which the coordinates aredefined by x in .Besides G-conv, we also introduce residual connectionsinto our block, because residual networks usually lead tofaster convergence and better results. It also helps to exploitthe similarity between LR point cloud and the correspond-ing HR point cloud.In our experiment, the number of neighbors k is set to8. All the G-conv operators inside the block have the samenumber of channels, which is 128. The input feature f in and point cloud x in are processed by 12 residual layers toobtain f out , while x out is the same as x in . The unpooling block takes point cloud x in and the corre-sponding features f in as inputs. It first transforms f in withshape ˆ n × c to a tensor with shape ˆ n × by a G-conv layer.The tensor is then reshaped to ˆ n × × , which is noted as δx . The upsampled point cloud x out is obtained by adding x in and δx point-wisely, where each point is transformedinto 2 points. The unpooling block is designed to predictthe residual between x in and x out instead of regressing x out directly. This exploits the similarity between x in and x out ,which leads to a faster convergence and better performance.The feature of the output point cloud, f out , are obtainedby the following equation: f pout = 1 k Σ q ∈ N [ x in ]( p ) f qin , ∀ p ∈ x out , (7)where N [ x in ]( p ) means the k nearest neighbors of point p in point cloud x in . As shown in Figure 3, a residual graph convolution blocktakes both the point cloud and the corresponding feature asinputs. However, the generator only have one input, thepoint cloud x . To obtain the other input, the correspond-ing feature f , we design a simple block named feature net,which takes the point cloud x as input. Specifically, for eachpoint p ∈ x with shape × , we first obtain its k nearestneighbors P with shape k × . Then, a series of point-wiseconvolutions with a max-pooling layer transform ˆ P = P − p into f p with shape × c .In our experiment, k is set to 8 while c is set to 128. Thenumber of convolution layers is set to 3. Instead of directly upsampling the LR point cloud withthe desired upscale ratio, we choose to generate the HRpoint cloud step by step. The point cloud is upsampled by2 times at each step, as shown in Figure 2. Our experimentshows that such an approach results in better accuracy.
To generate more realistic HR point cloud, we present agraph adversarial loss for point cloud, which is defined bythe discriminator D . As shown in Figure 2, D is composedof feature net, residual graph convolution block, and pool-ing block. For feature net, k is set to 8 while c is set to 64.The number of convolution layers is set to 2. For residualgraph convolution block, k is set to 8 while c is set to 64.The number of layers is set to 4. Pooling Block
Given the input point cloud x in with shape n × , we first employ farthest point sampling (FPS) togenerate x out with shape n × . The corresponding features f out is then obtained as follows: f pout = max q ∈ N [ x in ]( p ) f qin , ∀ p ∈ x out , (8)where max is applied element-wisely. Graph Patch GAN
Most discriminators downsample theinput progressively to obtain a single flag for the whole in-put. Such a design usually leads to blurry and unpleasantartifacts. Instead of employing a global discriminator, webuild a graph patch GAN based on [29]. Specifically, ourdiscriminator downsamples the input multiple times so thatthe output contains more than 1 point. Graph patch GANforces every local patch of the generated point cloud to lieon the distribution of the real HR point cloud. In our exper-iment, we set the number of the output points to 64.
4. Experiment
In this section, we first introduce the datasets for trainingand testing, as well as the details of our implementation. ethod CD EMD F-score NUC with different p Deviation (1e-2) Params τ = 0 .
01 0 .
2% 0 .
4% 0 .
6% 0 .
8% 1 . mean stdInput 0.0120 0.0036 41.33% 0.315 0.224 0.185 0.163 0.150 - - -MLS [2] 0.0117 0.0043 57.70% 0.364 0.272 0.229 0.204 0.186 AR-GCN % Table 1:
Quantitative Comparison on the Train-Test Dataset.
Method CD EMD F-score NUC with different p Deviation (1e-2) τ = 0 .
01 0 .
2% 0 .
4% 0 .
6% 0 .
8% 1 . mean stdInput 0.0077 % Table 2:
Quantitative Comparison on SHREC15.
The quantitative and qualitative results are then presentedto show the effectiveness of our method. To demonstratethe effect of each component in AR-GCN, we also conducta comprehensive ablation study. To further show the po-tential applications of our method, we test AR-GCN in aniterative setting and apply it to 3D reconstruction task. Be-sides, we also employ our method to assist LR point cloudclassification.
We utilize two datasets for our experiments. One isthe train-test dataset, which our method is trained with andtested on. The other is the unseen dataset, where our methodis directly tested without training or finetuning.
Train-Test Dataset
We use the dataset proposed in PU-Net [37] for training and testing. This dataset contains different models from the Visionair repository. Followingthe protocol in PU-Net, we use models for training whilethe other models are used for testing. For training, patches are extracted from each model as the ground truth,which contains , points. The input patch is randomlysampled from the ground truth patch at each iteration oftraining, which contains , points. For testing, we sam-ple , points uniformly per model as the ground truth,while sampling , points as the input. Unseen Dataset: SHREC15
To further validate the gen-eralization ability of our method, we directly test AR-GCNon SHREC15 [21] after training with the train-test datasetwithout finetuning. For SHREC15, there are 50 categoriesin total and 24 models in each category. We randomlychoose one model from each category for testing, since themodels in each category only differ in the pose. Same as thetrain-test dataset, the ground truth contains , points,while the input contains , points. Our method is implemented in Tensorflow [1] and runson a single Titan-Xp GPU. To avoid overfitting, we aug-ment the training data by randomly rotating, shifting andscaling the data. For optimization, we use Adam [17] asthe optimizer, where the batch size is 28 and the learningrate is 0.001. The network is firstly trained with L cd for 80epochs at the speed of . min/epoch. Then we add L G andfinetune the network for another 40 epochs at the speed of . min/epoch. The training process takes about . hoursin total, while it takes about . hours for training PU-Netunder the same setting ( . min/epoch). Evaluation Metrics
We adopt standard 3D reconstruc-tion metrics for point cloud super-resolution because theoutput of both tasks is point cloud. To measure the dif-ference between ˆ y and y point-wisely, we utilize the stan-dard Chamfer Distance (CD) and Earth Mover’s Distance(EMD), for which smaller is better.CD and EMD are heavily influenced by the outliers.Thus, we also report F-score [31] by treating point cloudsuper-resolution as a classification problem. Specifically,precision and recall are first evaluated by checking the per-centage of points in ˆ y or y that can find a neighbor from theother within certain threshold τ . The F-score is then calcu-lated as the harmonic mean of precision and recall. For thismetric, larger is better.The metrics in [37] are also employed to evaluate ourmethod. We use Deviation to measure the difference be-tween the predicted point cloud and the ground truth mesh,while normalized uniformity coefficient (NUC) is evaluatedfor measuring the uniformity. For these two metrics, smalleris better. Notably, the original mesh is used as the groundtruth instead of the sampled , points. a) Input (b) GT (c) PU-Net [37] (d) Ours Figure 4:
Qualitative Results. (a) is the LR input point cloud with a sparse distribution, while (b) is the corresponding HRpoint cloud with a dense distribution. (c) and (d) are the HR point cloud generated by PU-Net [37] and our method. Ourresults are sharper at edges, which have richer details but fewer noisy points, especially in the red boxes. Best viewed incolor.
Method Train-Test Dataset SHREC15CD EMD F-score Deviation (1e-2) CD EMD F-score Deviation (1e-2) τ = 0 . mean std τ = 0 . mean stdGCN × points × × pu AR-GCN w/o FT 0.0085 0.0041 69.53% 0.27 % % Table 3:
Ablation Study Results on the Train-Test Dataset and the Unseen Dataset, SHREC15.Comparison with Other Methods
The performance ofdifferent methods on the train-test dataset is reported in Ta-ble 1. We first report the performance of the LR input as apreliminary baseline. We then report the performance ofMoving Least Squares (MLS) [2], which is a traditionalmethod. The performance of PU-Net is then reported,which is the state-of-the-art method for point cloud super-resolution. As shown in Table 1, our method outperforms all theother methods under most metrics. Particularly, our methodimproves the F-score by more than , while advancesCD a large step. Surprisingly, our method even outperformsPU-Net in NUC, although it is not trained with the repulsionloss [37], which forces the generated point cloud uniform. The result of MLS is reproduced with Point Cloud Library (PCL). The result of PU-Net is reproduced with the official code by followingthe author’s instructions.
We attribute this to the effect of our adversarial loss. As forMLS, it performs well in Deviation but has the lowest NUCscores. The reason is that MLS tends to produce new pointsclose to the points in the input point cloud, which leads toa non-uniformly distributed point cloud, resulting in poorperformance on NUC metric. However, the mean deviationto the ground truth is small. Similar results are obtained onSHREC15 as shown in Table 2, which shows the general-ization ability of our method on the unseen dataset. As forNUC and Deviation, our method outperforms the baselinemethods by a large margin.Since both our method and PU-Net are deep learningbased method, we also compare the number of parametersin each model. As shown in Table 1, our model containsabout the same number of learnable parameters as PU-Netwhile achieves much better performance. .4. Qualitative Results
The evaluation metrics reflect the shape quality to somedegree. However, they mainly focus on point-wise distanceand fail to reflect the surface properties such as smoothnessand high-order details. Since there are barely any standardmetrics to measure these aspects, we present a series ofqualitative results to show the advantage of our method.Figure 4 presents the point clouds in both datasets visu-ally, where the 1st row is from the train-test dataset and the2nd row is from SHREC15. Compared to PU-Net, the HRpoint clouds of our method have richer details and sharperedges. Besides, ours are more uniformly distributed in thesmooth area with fewer noisy points. On the contrary, theresults from PU-Net are noisy and blurry on the edges withlittle details. It tends to outspread uniformly without pre-serving the underlying structure. Notably, the differencesaround the legs, feet and horns are most obvious, as shownin the red boxes.
To further verify the effect of each component in ourmethod, we conduct an ablation study on both datasets andpresent the results in Table 3.We first present the results of a very simple baselineGCN × points . Compared to AR-GCN, there are mainly threemodifications. First, we remove the residual connectionsfrom the proposed generator, as well as the skip connectionbetween input and output. Second, it regresses the coor-dinates directly under the supervision of a single loss L cd .Third, instead of progressively upscaling, it upsamples thepoint cloud by × directly with the number of G-conv lay-ers unchanged. Without all the key features in our method,it is not surprising that this simple baseline does not workat all. We then put back the skip connection between inputand output, which forces the model to predict the residual δx instead of directly regressing the point coordinates. Thissimple change largely improves the stability of learning,which results in a model with a reasonable performance, asshown by GCN × . We further transform graph convolutioninto residual graph convolution by putting back the resid-ual connections. Such a modification improves F-score byabout on SHREC15 and . on the train-test dataset,as shown by ResGCN × . By enabling progressive super-resolution, we obtain the proposed generator. As shown byResGCN, the F-score increases by nearly on the train-test dataset. The Deviation also improves a lot on bothdatasets. When changing L cd into the loss proposed in [37],the F-score decreases by about as shown by ResGCN+ L pu and ResGCN, which demonstrates the effectivenessof L cd . Compared to PU-Net, ResGCN + L pu improves theF-score by more than because of the proposed resid-ual graph network ResGCN. By replacing L pu with the pro-posed loss, the F-score is further improved by around , Dataset Method F-score NUC Deviation τ = 0 .
01 1 . meanTrain-Test PU-Net 43.2% 0.131 0.78Ours 70.3% 0.128 0.26Ours+noisy 55.4% 0.128 0.59Ours+uneven 46.0% 0.202 0.56SHREC15 PU-Net 56.4% 0.180 0.90Ours 93.1% 0.130 0.18Ours+noisy 78.8% 0.130 0.55Ours+uneven 75.5% 0.201 0.37 Table 4:
Experiments on noisy data and uneven data. which achieves the state-of-the-art performance, as shownby AR-GCN.We also take an experiment to compare different trainingstrategies. AR-GCN w/o FT is trained with the proposedloss L ( x, y ) for epochs, while AR-GCN is trained withL cd for 80 epochs and then finetuned with L ( x, y ) for an-other 40 epochs. As shown in the table, AR-GCN outper-forms AR-GCN w/o FT consistently, which shows the su-periority of the 2-step training strategy. To show the robustness of our method, we take exper-iments on noisy point clouds and non-uniformly sampledpoint clouds separably. The quantitative results on theTrain-Test dataset and SHREC15 are shown in Table 4.Ours+noisy represents applying AR-GCN on noisy pointclouds (Gaussian noise z , where z ∼ N (0 , . ). Resultsshow that our method with noisy point clouds outperformsPU-Net with clean ones. Ours+uneven means employingour method on non-uniformly sampled point clouds. Ourmethod with uneven point clouds achieves similar perfor-mance to PU-Net with uniform ones on all the metrics ex-cept NUC. We attribute this to the non-uniform distributionof the input point clouds. Our method can only make it moreuniform to a certain degree. We plan to solve this problemin the future work. Iterative Super Resolution
Our method is trained to up-scale a point cloud by a fixed ratio, which is × in our set-ting. To demonstrate the ability of upsampling a point cloudby more than × , we conduct an experiment that takes theoutput of the previous iteration as input and upsamples it by × again with AR-GCN. The initial point cloud has , points, which are upsampled by × after 2 iterations. Asshown in Figure 5, our method not only handles a relativelysparse point cloud but also upsamples it by more than × iteratively. Although the resulting point cloud is not as goodas that in Figure 4, it recovers many details from the pointcloud with only , points, which is promising. a) Input (b) 1st iteration (c) 2nd iteration Figure 5:
Iterative Super Resolution. (a) is the input pointcloud with , points. (b) and (c) are the generated HRpoint clouds after the 1st and 2nd iterations. At each itera-tion, the output from the previous iteration is upsampled by4 times with our method. Best viewed in color. (a) Input (b) GT(c) PU-Net [37] (d) Ours Figure 6:
Mesh Reconstruction from Point Cloud.
Thedifferences inside the red boxes are most obvious. Bestviewed in color.
3D Reconstruction
In 3D reconstruction, the quality anddensity of the point cloud have a huge impact on the qual-ity of the reconstructed mesh. However, due to the limita-tions of scanning devices, the point cloud is usually sparseand noisy. Thus, point cloud super-resolution is the key toimprove the quality of 3D reconstruction. We employ ourmethod and PU-Net to generate the HR point cloud, whichis then fed into Ball pivoting algorithm [3] for mesh recon-struction. As shown in Figure 6, the mesh reconstructedfrom our method contains richer details compared to thatfrom the LR input. Besides, ours is smoother in the flat areaand sharper at the edges, while the mesh from PU-Net isnoisy with many unpleasant artifacts, especially in the redboxes.
LR HR
LRHR
Figure 7: Employ our method on real-scanned and un-evenpoint cloud. Best viewed in color.
LR Point Cloud Classification
For 3D understanding,the classification accuracy of the LR point cloud is usu-ally lower than that of the HR point cloud. To improvethe performance of LR point cloud classification, one possi-ble way is transforming it to HR point cloud with the pointcloud super-resolution method. To show the effectiveness ofpoint cloud super-resolution, we employ PointNet++ [27]on ModelNet40 dataset [36] for point cloud classification.As shown in Table 5, PointNet++ achieves . in clas-sification accuracy with , points as input. When werandomly sample 256 points from the , points and sendthem to PointNet++, the performance drops from . to . . We then upsample the 256 points by × with ourmethod and send the , points to PointNet++, result-ing in . in accuracy, which outperforms . by alarge margin. This experiment shows that point cloud super-resolution is important for understanding LR point cloud. Real-Scanned Point Cloud
We also conduct an experi-ment on real-scanned and un-even point clouds. As shownin Figure 7, our method generates a denser point cloud withmore uniformly distributed points, while maintains the un-derlying structure such as the striped texture.
5. Conclusion
We proposed a graph convolution network AR-GCN forpoint cloud super-resolution, which is composed of a resid-ual generator, a graph discriminator, and a graph adver-sarial loss. With comprehensive experiments, we demon-strated that residual connections and residual prediction areeffective for stable training and better performance. Withthe proposed graph adversarial loss, our method generatesmore realistic HR point cloud compared to the manuallydesigned loss function. The experiment on train-test datasetshowed that our method outperforms other methods. Thexperiment on the unseen dataset SHREC15 further demon-strated the better performance and generalization ability ofour method. Notably, our method is not designed for com-pletion, thus it can not fill large holes or missing parts. We’dlike to solve the limitations in the future work.
References [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensor-flow: a system for large-scale machine learning. In
OSDI ,2016. 5[2] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin,and C. T. Silva. Computing and rendering point set surfaces.
IEEE Transactions on visualization and computer graphics ,2003. 2, 5, 6[3] F. Bernardini, J. Mittleman, H. Rushmeier, C. Silva, andG. Taubin. The ball-pivoting algorithm for surface recon-struction.
IEEE transactions on visualization and computergraphics , 1999. 1, 8[4] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van-dergheynst. Geometric deep learning: going beyond eu-clidean data.
IEEE Signal Processing Magazine , 2017. 3[5] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolu-tional neural networks on graphs with fast localized spectralfiltering. In
NIPS , 2016. 2[6] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deepconvolutional network for image super-resolution. In
ECCV ,2014. 2[7] H. Fan, H. Su, and L. J. Guibas. A point set generation net-work for 3d object reconstruction from a single image. In
CVPR , 2017. 1, 2[8] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Gen-erative adversarial nets. In
NIPS , 2014. 3[9] W. Han, S. Chang, D. Liu, M. Yu, M. Witbrock, and T. S.Huang. Image super-resolution via dual-state recurrent net-works. In
CVPR , 2018. 2[10] M. Haris, G. Shakhnarovich, and N. Ukita. Deep backpro-jection networks for super-resolution. In
CVPR , 2018. 2[11] R. Hartley and A. Zisserman.
Multiple view geometry incomputer vision . Cambridge university press, 2003. 1[12] H. Huang, D. Li, H. Zhang, U. Ascher, and D. Cohen-Or.Consolidation of unorganized point clouds for surface recon-struction.
TOG , 2009. 1[13] H. Huang, S. Wu, M. Gong, D. Cohen-Or, U. Ascher, andH. R. Zhang. Edge-aware point set resampling.
TOG , 2013.1, 2[14] M. Jiang, Y. Wu, and C. Lu. Pointsift: A sift-like net-work module for 3d point cloud semantic segmentation. arXiv:1807.00652 , 2018. 2[15] H. Kato, Y. Ushiku, and T. Harada. Neural 3d mesh renderer.In
CVPR , 2018. 2[16] J. Kim, J. Kwon Lee, and K. Mu Lee. Deeply-recursiveconvolutional network for image super-resolution. In
CVPR ,2016. 1 [17] D. P. Kingma and J. Ba. Adam: A method for stochasticoptimization. arXiv:1412.6980 , 2014. 5[18] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham,A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, et al.Photo-realistic single image super-resolution using a genera-tive adversarial network. In
CVPR , 2017. 2[19] J. Li, B. M. Chen, and G. H. Lee. So-net: Self-organizingnetwork for point cloud analysis. In
CVPR , 2018. 2[20] Y. Li, R. Bu, M. Sun, and B. Chen. Pointcnn. arXiv:1801.07791 , 2018. 2[21] Z. Lian, J. Zhang, S. Choi, H. ElNaghy, J. El-Sana, T. Fu-ruya, A. Giachetti, R. A. Guler, L. Lai, C. Li, H. Li,F. A. Limberger, R. Martin, R. U. Nakanishi, A. P. Neto,L. G. Nonato, R. Ohbuchi, K. Pevzner, D. Pickup, P. Rosin,A. Sharf, L. Sun, X. Sun, S. Tari, G. Unal, and R. C. Wilson.Non-rigid 3d shape retrieval. In , 2015. 5[22] Y. Lipman, D. Cohen-Or, D. Levin, and H. Tal-Ezer.Parameterization-free projection for geometry reconstruc-tion.
TOG , 2007. 2[23] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smol-ley. Least squares generative adversarial networks. In
ICCV ,2017. 2, 3[24] D. Maturana and S. Scherer. Voxnet: A 3d convolutionalneural network for real-time object recognition. In
IROS ,2015. 2[25] S. Pan and etc. Adversarially regularized graph autoencoderfor graph embedding. In
IJCAI , 2018. 2[26] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deeplearning on point sets for 3d classification and segmentation. arXiv:1612.00593 , 2016. 2[27] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deephierarchical feature learning on point sets in a metric space.In
NIPS , 2017. 2, 8[28] G. Riegler, A. O. Ulusoy, and A. Geiger. Octnet: Learningdeep 3d representations at high resolutions. In
CVPR , 2017.2[29] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang,and R. Webb. Learning from simulated and unsupervisedimages through adversarial training. In
CVPR , 2017. 4[30] E. Smith, S. Fujimoto, and D. Meger. 3d object super-resolution. arXiv:1802.09987 , 2018. 2[31] M. Sokolova, N. Japkowicz, and S. Szpakowicz. Beyondaccuracy, f-score and roc: a family of discriminant measuresfor performance evaluation. In
Australasian joint conferenceon artificial intelligence , 2006. 5[32] H. Su, V. Jampani, D. Sun, S. Maji, E. Kalogerakis, M.-H.Yang, and J. Kautz. Splatnet: Sparse lattice networks forpoint cloud processing. In
CVPR , 2018. 2[33] X. Sun, J. Wu, X. Zhang, Z. Zhang, C. Zhang, T. Xue, J. B.Tenenbaum, and W. T. Freeman. Pix3d: Dataset and methodsfor single-image 3d shape modeling. In
CVPR , 2018. 2[34] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang.Pixel2mesh: Generating 3d mesh models from single rgb im-ages. arXiv:1804.01654 , 2018. 2[35] Y. Wang, L. Wang, H. Wang, and P. Li. End-to-end im-age super-resolution via deep and shallow convolutional net-works. arXiv:1607.07680 , 2016. 236] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, andJ. Xiao. 3d shapenets: A deep representation for volumetricshapes. In
CVPR , 2015. 2, 8[37] L. Yu, X. Li, C.-W. Fu, D. Cohen-Or, and P.-A. Heng. Pu-net: Point cloud upsampling network. In
CVPR , 2018. 1, 2,5, 6, 7, 8[38] Y. Zhang and M. Rabbat. A graph-cnn for 3d point cloudclassification. In
ICASSP , 2018. 2[39] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu. Residualdense network for image super-resolution. In