[PDF] A deep perceptual metric for 3D point clouds

Abstract

Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to train these networks. We find that the commonly used focal loss and weighted binary cross entropy are poorly correlated with human perception. We thus propose a perceptual loss function for 3D point clouds which outperforms existing loss functions on the ICIP2020 subjective dataset. In addition, we propose a novel truncated distance field voxel grid representation and find that it leads to sparser latent spaces and loss functions that are more correlated with perceived visual quality compared to a binary representation. The source code is available at this https URL

Full PDF

AA deep perceptual metric for 3D point clouds

Maurice Quach † , Aladine Chetouani †+ , Giuseppe Valenzise † , Frederic Dufaux † ; † Universit´e Paris-Saclay, CNRS, CentraleSup´elec, Laboratoire des signaux et syst`emes; 91190 Gif-sur-Yvette, France + Laboratoire PRISME, Universit´e d’Orl´eans; Orl´eans, France

Abstract

Point clouds are essential for storage and transmission of3D content. As they can entail signiﬁcant volumes of data, pointcloud compression is crucial for practical usage. Recently, pointcloud geometry compression approaches based on deep neuralnetworks have been explored. In this paper, we evaluate the abil-ity to predict perceptual quality of typical voxel-based loss func-tions employed to train these networks. We ﬁnd that the commonlyused focal loss and weighted binary cross entropy are poorly cor-related with human perception. We thus propose a perceptual lossfunction for 3D point clouds which outperforms existing loss func-tions on the ICIP2020 subjective dataset. In addition, we proposea novel truncated distance ﬁeld voxel grid representation and ﬁndthat it leads to sparser latent spaces and loss functions that aremore correlated with perceived visual quality compared to a bi-nary representation. The source code is available at https: //github. com/ mauriceqch/ 2021_ pc_ perceptual_ loss . Introduction

As 3D capture devices become more accurate and accessi-ble, point clouds are a crucial data structure for storage and trans-mission of 3D data. Naturally, this comes with signiﬁcant vol-umes of data. Thus, Point Cloud Compression (PCC) is an essen-tial research topic to enable practical usage. The Moving PictureExperts Group (MPEG) is working on two PCC standards [1]:Geometry-based PCC (G-PCC) and Video-based PCC (V-PCC).G-PCC uses native 3D data structures to compress point clouds,while V-PCC employs a projection-based approach using videocoding technology [2]. These two approaches are complementaryas V-PCC specializes on dense point clouds, while G-PCC is amore general approach suited for sparse point clouds. Recently,JPEG Pleno [3] has issued a Call for Evidence on PCC [4].Deep learning approaches have been employed to compressgeometry [5, 6, 7, 8, 9, 10] and attributes [11, 12] of points clouds.Speciﬁc approaches have also been developed for sparse LIDARpoint clouds [13, 14]. In this work, we focus on lossy point cloudgeometry compression for dense point clouds. In existing ap-proaches, different point cloud geometry representations are con-sidered for compression: G-PCC adopts a point representation,V-PCC uses a projection or image-based representation and deeplearning approaches commonly employ a voxel grid representa-tion. Point clouds can be represented in different ways in thevoxel grid. Indeed, voxel grid representations include binary andTruncated Signed Distance Fields (TSDF) representations [15].TDSFs rely on the computation of normals; however, in the caseof point clouds this computation can be noisy. We then ignore thenormal signs and reformulate TDSFs to propose a new TruncatedDistance Field (TDF) representation for point cloudsDeep learning approaches for lossy geometry compression x f a f s ˜ xy (a) Autoencoder model. x ( A ) f a f s ˜ x ( A ) y ( A ) x ( B ) f a f s ˜ x ( B ) y ( B ) − y ( A ) − y ( B ) (b) Perceptual loss. Figure 1.

Perceptual loss based on an autoencoder. The grayed out partsdo not need to be computed for the perceptual loss. typically jointly optimize rate and distortion. As a result, an ob-jective quality metric, employed as a loss function, is necessaryto deﬁne the distortion objective during training. Such metricsshould be differentiable, deﬁned on the voxel grid and well corre-lated with perceived visual quality. In this context, the WeightedBinary Cross Entropy (WBCE) and the focal loss [16] are com-monly used loss functions based on a binary voxel grid represen-tation. They aim to alleviate the class imbalance between emptyand occupied voxels caused by point cloud sparsity. However,they are poorly correlated with human perception as they onlycompute a voxel-wise error.A number of metrics have been proposed for Point CloudQuality Assessment (PCQA): the point-to-plane (D2) metric [18],PC-MSDM [19], PCQM [20], angular similarity [21], point todistribution metric [22], point cloud similarity metric [23], im-proved PSNR metrics [24] and a color based metric [25]. Thesemetrics operate directly on the point cloud. However, they arenot deﬁned on the voxel grid and hence cannot be used easily asloss functions. Recently, to improve upon existing loss functionssuch as the WBCE and the focal loss, a neighborhood adaptiveloss function [17] was proposed. Still, these loss functions arebased on the explicit binary voxel grid representation. We showin this paper that loss functions based on the TDF representationare more correlated with human perception than those based onthe binary representation.The perceptual loss has previously been proposed as an ob-jective quality metric for images [26]. Indeed, neural networkslearn representations of images that are well correlated with per-ceived visual quality. This enables the deﬁnition of the perceptualloss as a distance between latent space representations. For thecase of images, the perceptual loss provides competitive perfor-mance or even outperforms traditional quality metrics. We hy-pothesize that a similar phenomenon can be observed for point a r X i v : . [ c s . C V ] F e b . . . . . . (a) Binary. − . − . . . . (b) Truncated Signed Distance Field (TSDF). . . . . . . (c) Truncated Distance Field (TDF). Figure 2.

Voxel grid representations of a point cloud. The upper bound distance value is 2 for the TDF and the TSDF. Normals are facing up in the TSDF.

Table 1: Objective quality metrics considered in this study.

Domain Name Signal type Blockaggregation Learningbased DescriptionPoints D1 MSE Coordinates (cid:55) (cid:55)

Point-to-point MSED2 MSE Coordinates (cid:55) (cid:55)

Point-to-plane MSED1 PSNR Coordinates (cid:55) (cid:55)

Point-to-point PSNRD2 PSNR Coordinates (cid:55) (cid:55)

Point-to-plane PSNRVoxelGrid Bin BCE Binary L1 (cid:55)

Binary cross entropyBin naBCE Binary L1 (cid:55)

Neighborhood adaptive binary cross entropy [17]Bin WBCE 0.75 Binary L2 (cid:55)

Weighted binary cross entropy with w = . Bin PL Binary L1 (cid:51)

Perceptual loss (explicit) on all feature mapsBin PL F1 Binary L1 (cid:51)

Perceptual loss (explicit) on feature map 1TDF MSE Distances L1 (cid:55)

Truncated distance ﬁeld (TDF) MSETDF PL Distances L1 (cid:51)

Perceptual loss (implicit) over all feature mapsTDF PL F9 Distances L1 (cid:51)

Perceptual loss (implicit) on feature map 9 clouds.Therefore, we propose a differentiable perceptual loss fortraining deep neural networks aimed at compressing point cloudgeometry. We investigate how to build and train such a perceptualloss to improve point cloud compression results. Speciﬁcally, webuild a differentiable distortion metric suitable for training neu-ral networks to improve PCC approaches based on deep learning.We then validate our approach experimentally on the ICIP2020[27] subjective dataset. The main contributions of the paper areas follows:• A novel perceptual loss for 3D point clouds that outperformsexisting metrics on the ICIP2020 subjective dataset• A novel implicit TDF voxel grid representation• An evaluation of binary (explicit) and TDF (implicit) rep-resentations in the context of deep learning approaches forpoint cloud geometry compression

Voxel grid representations

In this study, we consider different voxel grid representationsfor point clouds. A commonly used voxel grid representation isthe explicit binary occupancy representation where the occupancy of a voxel (occupied or empty) is represented with a binary value(Figure 2a). In the binary (Bin) representation (Figure 2a), eachvoxel has a binary occupancy value indicating whether it is occu-pied or empty. When the i th voxel is occupied, then x i = x i = i th voxel value is x i = d , where d is the distance to its nearestoccupied voxel. Consequently, x i = x i = d with d > [ , ] interval with x i = min ( d , u ) / u , (1)here u is an upper bound value.In this study, we focus on the explicit binary and implicitTDF representations. Objective quality metrics

In Table 1, we present the objective quality metrics consid-ered in this study. Speciﬁcally, we evaluate metrics that are differ-entiable on the voxel grid to evaluate their suitability as loss func-tions for point cloud geometry compression. We include metricsdeﬁned on binary and TDF representations and we compare theirperformance against traditional point set metrics.

Voxel grid metrics

We partition the point cloud into blocks and compute voxelgrid metrics for each block. For each metric, we aggregate metricvalues over all blocks with either L1 or L2 norm. Speciﬁcally, weselect the best aggregation experimentally for each metric.Given two point clouds A and B, we denote the i th voxelvalue for each point cloud as x ( A ) i and x ( B ) i . Then, we deﬁne theWBCE as follows − N ∑ i (cid:16) α x ( A ) i log ( x ( B ) i )+ ( − α )( − x ( A ) i ) log ( − x ( B ) i ) (cid:17) , (2)where α is a balancing weight between 0 and 1. The binary crossentropy (BCE) refers to the case α = . γ >

1) or reduces ( γ <

1) errors and is deﬁned as follows − ∑ i (cid:16) α x ( A ) i ( − x ( B ) i ) γ log ( x ( B ) i )+ ( − α )( − x ( A ) i )( x ( B ) i ) γ log ( − x ( B ) i ) (cid:17) , (3)where α is a balancing weight and the log arguments are clippedbetween 0 .

001 and 0 . ( − x ( B ) i ) γ and ( x ( B ) i ) γ . However, while in the context of neural training, x ( B ) i is an occupancy probability, in the context of quality assessment, x ( B ) i is a binary value. As a result, the FL is equivalent to theWBCE since γ has no impact in the latter case. For this reason,we include the WBCE with α = .

75 in our experiments as anevaluation proxy for the FL used in [6].The neighborhood adaptive BCE (naBCE) [17] was pro-posed as an alternative to the BCE and FL [16]. It is a variantof the WBCE in which the weight α adapts to the neighborhoodof each voxel u resulting in a weight α u . Given a voxel u , itsneighboorhood is a window W of size m × m × m centered on u . Then, the neighborhood resemblance r u is the sum of the in-verse euclidean distances of neighboring voxels with the same bi-nary occupancy value as u . Finally, the weight α u is deﬁned as α u = max ( − r u / max ( r )) , . ) where max ( r ) is the maximumof all neighborhood resemblances.The Mean Squared Error (MSE) on the TDF is expressed asfollows1 N ∑ i ( x ( A ) i − x ( B ) i ) . (4) Perceptual Loss

We propose a perceptual loss based on differences betweenlatent space representations learned by a neural network. Moreprecisely, we use an autoencoder as the underlying neural net-work. The model architecture of autoencoder used and its trainingprocedure are presented in the following.

Model architecture

We adopt an autoencoder architecture based on 3D convolu-tions and transposed convolutions. Given an input voxel grid x ,we perform an analysis transform f a ( x ) = y to obtain the latentspace y and a synthesis transform f s ( y ) = ˜ x as seen in Figure 1a.The analysis transform is composed of three convolutions withkernel size 5, stride 2 while the synthesis transform is composedof three transposed convolutions with same kernel size and stride.We use ReLU [28] activations for all layers except for the lastlayer which uses a sigmoid activation. Training

Using the previously deﬁned architecture, we train two neu-ral networks: one with explicit representation (binary) and an-other with implicit representation (TDF).In the explicit case, we perform training of the perceptualloss with a focal loss function as deﬁned in Eq. (3). In the implicitcase, we ﬁrst deﬁne the Kronecker delta δ i such that δ i = i = δ i =

0. Then, we deﬁne an adaptive MSE lossfunction for the training of the perceptual loss as follows1 N ∑ i (cid:16) δ − x ( A ) i w ( x ( A ) i − x ( B ) i ) + ( − δ − x ( A ) i )( − w )( x ( A ) i − x ( B ) i ) (cid:17) , (5)where w is a balancing weight. Speciﬁcally, we choose w as theproportion of distances strictly inferior to 1 with w = min (cid:32) max (cid:32) ∑ i − δ − x ( A ) i N , β (cid:33) , − β (cid:33) . (6)where β is a bounding factor such that w is bounded by [ β , − β ] .This formulation compensates for class imbalance while avoidingextreme weight values.In that way, the loss function adapts the contributions fromthe voxels that are far from occupied voxels ( x ( A ) i =

1) and voxelsthat are near occupied voxels ( x ( A ) i < Metric

As seen in Figure 1b, in order to compare two pointclouds x ( A ) and x ( B ) , we compute their respective latent spaces f a ( x ( A ) ) = y ( A ) and f a ( x ( B ) ) = y ( B ) using the previously trainedanalysis transform. These latent spaces each have F feature mapsof size W × D × H (width, depth, height). Then, we deﬁne theMSE between latent spaces as follows1 N ∑ i ( y ( A ) i − y ( B ) i ) . (7)We compute this MSE either over all F feature maps or on singlefeature maps. able 2: Statistical analysis of objective quality metrics. Method PCC SROCC RMSE ORTDF PL F9 .

951 0 .

947 0 .

094 0 . D2 MSE .

946 0 .

943 0 .

100 0 . TDF MSE .

940 0 .

103 0 . D1 MSE .

938 0 .

933 0 .

109 0 . TDF PL .

935 0 .

933 0 .

110 0 . Bin PL F1 .

922 0 .

916 0 .

115 0 . D2 PSNR .

900 0 .

898 0 .

129 0 . Bin WBCE 0.75 .

875 0 .

859 0 .

144 0 . Bin PL .

863 0 .

867 0 .

151 0 . D1 PSNR .

850 0 .

867 0 .

158 0 . Bin naBCE .

740 0 .

719 0 .

201 0 . Bin BCE .

713 0 .

721 0 .

207 0 . Point set metrics

Point-to-point (D1) and point-to-plane (D2)

The point-to-point distance (D1) [30] measures the averageerror between each point in A and their nearest neighbor in B: e D , B = N A ∑ ∀ a i ∈ A (cid:107) a i − b j (cid:107) (8)where b j is the nearest neighbor of a i in B.In contrast to D1, the point-to-plane distance (D2) [18]projects the error vector along the normal and is expressed as fol-lows e D , B = N A ∑ ∀ a i ∈ A (( a i − b j ) · n i ) (9)where b j is the nearest neighbor of a i in B and n i is the normalvector at a i .The normals for original and distorted point clouds are com-puted with local quadric ﬁttings using 9 nearest neighbors.The D1 and D2 MSEs are the maximum of e A , B and e B , A andtheir Peak Signal-to-Noise Ratio (PSNR) is then computed with apeak error corresponding to three times the point cloud resolutionas deﬁned in [30]. Experiments

Experimental setup

We evaluate the metrics deﬁned above on the ICIP2020 [27]subjective dataset. It contains 6 point clouds [31, 32] compressedusing G-PCC Octree, G-PCC Trisoup and V-PCC with 5 differentrates yielding a total of 96 stimuli (with 6 references) and theirassociated subjective scores.For each metric, we compute the Pearson Correlation Coef-ﬁcient (PCC), the Spearman Rank Order Correlation Coefﬁcient(SROCC), the Root Mean Square Error (RMSE) and the OutlierRatio (OR). We evaluate the statistical signiﬁcance of the differ-ences between PCCs using the method in [33]. These metrics arecomputed after logistic ﬁttings with cross-validation splits. Eachsplit contains stimuli for one point cloud (i.e. reference pointcloud and its distorted versions) as a test set and stimuli of all other point clouds as a training set. The metrics are then com-puted after concatenating results for the test set of each split. Theyare summarized in Table 1 and the values before and after logisticﬁtting are shown in Figure 3.We use an upper bound value u = m = . β = . β = .

999 on the ModelNet dataset [34] after blockpartitioning using Python 3.6.9 and TensorFlow [35] 1.15.0.

Comparison of perceptual loss feature maps

In our experiments, we ﬁrst considered the perceptual losscomputed over all feature maps. However, we observed that somefeature maps are more perceptually relevant than others. Con-sequently, we include the best feature maps for each voxel gridrepresentation in our results. This corresponds to feature map 9(TDF PL F9) for TDF PL and 1 (Bin PL F1) for Bin PL.Moreover, We observe that some feature maps are unusedby the neural network (constant). Therefore, they exhibit highRMSE values (all equal to 0 . Comparison of objective quality metrics

In Table 2, we observe that the TDF PL F9 is the best methodoverall. In particular, identifying the most perceptually relevantfeature map and computing the MSE on this feature map providesa signiﬁcant improvement. Speciﬁcally, the difference betweenthe PCCs of TDF PL F9 and TDF PL is statistically signiﬁcantwith a conﬁdence of 95%.For voxel grid metrics, we observe that TDF metrics per-form better than binary metrics. In particular, the RMSEs of theformer are noticeably lower for point clouds compressed with G-PCC Octree compared to the RMSE of the latter as can be seen inTable 3. This suggests that implicit representations may be bet-ter at dealing with density differences between point clouds in thecontext of point cloud quality assessment.

Conclusion

We proposed a novel perceptual loss that outperforms ex-isting objective quality metrics and is differentiable in the voxelgrid. As a result, it can be used as a loss function in deep neuralnetworks for point cloud compression and it is more correlatedwith perceived visual quality compared to traditional loss func-tions such as the BCE and the focal loss. Overall, metrics on theproposed implicit TDF representation performed better than ex-plicit binary representation metrics. Additionally, we observedthat the TDF representation yields sparser latent space represen-tations compared to the binary representations. This suggests thatswitching from binary to the TDF representation may improvecompression performance in addition to enabling the use of betterloss functions. ogistic G-PCC Octree G-PCC Trisoup V-PCC Reference . . M O S . . . .

00 0 .

25 0 .

50 0 . .

21 2 3 4 5Bin BCE24 M O S .

00 0 . M O S . . . . . . . . . .

21 2 3 4 5TDF MSE24 M O S Figure 3.

Scatter plots between the objective quality metrics and the MOS values. The plots before and after logistic ﬁtting are shown.

Table 3: Statistical analysis of objective quality metrics by compression method.

Method G-PCC Octree G-PCC Trisoup V-PCCPCC SROCC RMSE PCC SROCC RMSE PCC SROCC RMSETDF PL F9 .

975 0 . . .

936 0 . . .

897 0 .

850 0 . D2 MSE .

962 0 .

829 0 . .

954 0 . .

103 0 . . . TDF MSE .

952 0 .

839 0 .

106 0 .

933 0 .

917 0 . .

912 0 .

867 0 . D1 MSE . . .

082 0 .

937 0 . .

126 0 .

876 0 .

844 0 . TDF PL .

970 0 .

840 0 .

087 0 .

918 0 .

900 0 .

127 0 .

876 0 .

837 0 . Bin PL F1 .

941 0 .

786 0 .

138 0 .

927 0 .

907 0 .

107 0 . . . D2 PSNR . . .

110 0 .

926 0 .

895 0 .

108 0 .

738 0 .

723 0 . Bin WBCE 0.75 .

923 0 .

747 0 .

163 0 .

918 0 .

886 0 .

112 0 .

850 0 .

786 0 . Bin PL .

931 0 .

852 0 .

186 0 .

892 0 .

886 0 .

130 0 .

880 0 .

852 0 . D1 PSNR .

903 0 .

859 0 .

156 0 .

910 0 .

895 0 .

117 0 .

599 0 .

689 0 . Bin naBCE .

552 0 .

357 0 .

277 0 .

846 0 .

786 0 .

154 0 .

748 0 .

692 0 . Bin BCE .

946 0 .

841 0 .

188 0 .

776 0 .

800 0 .

177 0 .

574 0 .

500 0 . cknowledgments We would like to thank the authors of [17] for providing theirimplementation of naBCE. This work was funded by the ANRReVeRy national fund (REVERY ANR-17-CE23-0020).

References [1] S. Schwarz et al. , “Emerging MPEG Standards for Point CloudCompression,”

IEEE Journal on Emerging and Selected Topics inCircuits and Systems , pp. 1–1, 2018.[2] “ITU-T Recommendation H.265: High efﬁciency video coding(HEVC),” Nov. 2019.[3] T. Ebrahimi et al. , “JPEG Pleno: Toward an Efﬁcient Representationof Visual Reality,”

IEEE MultiMedia , vol. 23, no. 4, pp. 14–20, Oct.2016.[4] “Final Call for Evidence on JPEG Pleno Point Cloud Coding,”in

ISO/IEC JTC1/SC29/WG1 JPEG output document N88014 , Jul.2020.[5] M. Quach, G. Valenzise, and F. Dufaux, “Learning ConvolutionalTransforms for Lossy Point Cloud Geometry Compression,” in , Sep. 2019, pp. 4320–4324.[6] ——, “Improved Deep Point Cloud Geometry Compression,” in ,Oct. 2020.[7] J. Wang et al. , “Learned Point Cloud Geometry Compression,” arXiv:1909.12037 [cs, eess] , Sep. 2019.[8] ——, “Multiscale Point Cloud Geometry Compression,” arXiv:2011.03799 [cs, eess] , Nov. 2020.[9] D. Tang et al. , “Deep Implicit Volume Compression,” in , Jun. 2020,pp. 1290–1300.[10] S. Milani, “A Syndrome-Based Autoencoder For Point Cloud Ge-ometry Compression,” in , Oct. 2020, pp. 2686–2690.[11] M. Quach, G. Valenzise, and F. Dufaux, “Folding-Based Compres-sion Of Point Cloud Attributes,” in , Oct. 2020, pp. 3309–3313.[12] E. Alexiou, K. Tung, and T. Ebrahimi, “Towards neural networkapproaches for point cloud compression,” in

Applications of Digit.Image Process. XLIII , vol. 11510. Int. Society for Optics and Pho-tonics, Aug. 2020, p. 1151008.[13] L. Huang et al. , “OctSqueeze: Octree-Structured Entropy Model forLiDAR Compression,” in , Jun. 2020, pp. 1310–1320.[14] S. Biswas et al. , “MuSCLE: Multi Sweep Compression of Li-DAR using Deep Entropy Models,”

Adv. Neural Inf. Process. Syst. ,vol. 33, 2020.[15] B. Curless and M. Levoy, “A volumetric method for building com-plex models from range images,” in

Proc. of the 23rd annual conf.on Computer graphics and interactive techniques - SIGGRAPH ’96 .ACM Press, 1996, pp. 303–312.[16] T.-Y. Lin et al. , “Focal Loss for Dense Object Detection,” in , Oct. 2017, pp. 2999–3007.[17] A. Guarda, N. Rodrigues, and F. Pereira, “Neighborhood Adap-tive Loss Function for Deep Learning-based Point Cloud Codingwith Implicit and Explicit Quantization,”

IEEE MultiMedia , pp. 1–1, 2020.[18] D. Tian et al. , “Geometric distortion metrics for point cloud com- pression,” in . Bei-jing: IEEE, Sep. 2017, pp. 3460–3464.[19] G. Meynet, J. Digne, and G. Lavou´e, “PC-MSDM: A quality met-ric for 3D point clouds,” in , Jun. 2019, pp. 1–3.[20] G. Meynet et al. , “PCQM: A Full-Reference Quality Metric for Col-ored 3D Point Clouds,” in , Athlone, Ireland, May 2020.[21] E. Alexiou and T. Ebrahimi, “Point Cloud Quality Assessment Met-ric Based on Angular Similarity,” in , Jul. 2018, pp. 1–6.[22] A. Javaheri et al. , “Mahalanobis Based Point to Distribution Metricfor Point Cloud Geometry Quality Evaluation,”

IEEE Signal Pro-cess. Lett. , vol. 27, pp. 1350–1354, 2020.[23] E. Alexiou and T. Ebrahimi, “Towards a Point Cloud Structural Sim-ilarity Metric,” in , Jul. 2020, pp. 1–6.[24] A. Javaheri et al. , “Improving PSNR-based Quality Metrics Perfor-mance For Point Cloud Geometry,” in , Oct. 2020, pp. 3438–3442.[25] I. Viola, S. Subramanyam, and P. Cesar, “A Color-Based ObjectiveQuality Metric for Point Cloud Contents,” in , May 2020, pp. 1–6.[26] R. Zhang et al. , “The Unreasonable Effectiveness of Deep Featuresas a Perceptual Metric,” in . IEEE, Jun. 2018, pp. 586–595.[27] S. Perry et al. , “Quality Evaluation Of Static Point Clouds EncodedUsing MPEG Codecs,” in , Oct. 2020, pp. 3428–3432.[28] V. Nair and G. E. Hinton, “Rectiﬁed Linear Units Improve Re-stricted Boltzmann Machines,” in

Proc. of the 27th Int. Conf. onMach. Learn. (ICML) , 2010, pp. 807–814.[29] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza-tion,” in , Dec. 2014.[30] “Common test conditions for point cloud compression,” in

ISO/IECJTC1/SC29/WG11 MPEG output document N19084 , Feb. 2020.[31] C. Loop et al. , “Microsoft voxelized upper bodies - a voxelizedpoint cloud dataset,” in

ISO/IEC JTC1/SC29 Joint WG11/WG1(MPEG/JPEG) input document m38673/M72012 , May 2016.[32] E. d’Eon et al. , “8i Voxelized Full Bodies - A Voxelized Point CloudDataset,” in

ISO/IEC JTC1/SC29 Joint WG11/WG1 (MPEG/JPEG)input document WG11M40059/WG1M74006 , Geneva, Jan. 2017.[33] G. Y. Zou, “Toward using conﬁdence intervals to compare correla-tions.”

Psychol. Methods , vol. 12, no. 4, pp. 399–413, Dec. 2007.[34] N. Sedaghat et al. , “Orientation-boosted Voxel Nets for 3D ObjectRecognition,” arXiv:1604.03351 [cs] , Apr. 2016.[35] M. Abadi et al. , “TensorFlow: Large-Scale Machine Learning onHeterogeneous Distributed Systems,” arXiv:1603.04467 [cs] , Mar.2016.