[PDF] NormalNet: Learning-based Normal Filtering for Mesh Denoising

Abstract

Mesh denoising is a critical technology in geometry processing that aims to recover high-fidelity 3D mesh models of objects from their noise-corrupted versions. In this work, we propose a learning-based normal filtering scheme for mesh denoising called NormalNet, which maps the guided normal filtering (GNF) into a deep network. The scheme follows the iterative framework of filtering-based mesh denoising. During each iteration, first, the voxelization strategy is applied on each face in a mesh to transform the irregular local structure into the regular volumetric representation, therefore, both the structure and face normal information are preserved and the convolution operations in CNN(Convolutional Neural Network) can be easily performed. Second, instead of the guidance normal generation and the guided filtering in GNF, a deep CNN is designed, which takes the volumetric representation as input, and outputs the learned filtered normals. At last, the vertex positions are updated according to the filtered normals. Specifically, the iterative training framework is proposed, in which the generation of training data and the network training are alternately performed, whereas the ground truth normals are taken as the guidance normals in GNF to get the target normals. Compared to state-of-the-art works, NormalNet can effectively remove noise while preserving the original features and avoiding pseudo-features.

Full PDF

11 NormalNet: Learning-based Normal Filtering forMesh Denoising

Wenbo Zhao, Xianming Liu,

Member, IEEE , Yongsen Zhao, Xiaopeng Fan,

Senior Member, IEEE

DebinZhao

Member, IEEE , Abstract —Mesh denoising is a critical technology in geometry processing that aims to recover high-ﬁdelity 3D mesh models of objectsfrom their noise-corrupted versions. In this work, we propose a learning-based normal ﬁltering scheme for mesh denoising called

NormalNet , which maps the guided normal ﬁltering (GNF) into a deep network. The scheme follows the iterative framework ofﬁltering-based mesh denoising. During each iteration, ﬁrst, the voxelization strategy is applied on each face in a mesh to transform theirregular local structure into the regular volumetric representation, therefore, both the structure and face normal information arepreserved and the convolution operations in CNN(Convolutional Neural Network) can be easily performed. Second, instead of theguidance normal generation and the guided ﬁltering in GNF, a deep CNN is designed, which takes the volumetric representation asinput, and outputs the learned ﬁltered normals. At last, the vertex positions are updated according to the ﬁltered normals. Speciﬁcally,the iterative training framework is proposed, in which the generation of training data and the network training are alternately performed,whereas the ground truth normals are taken as the guidance normals in GNF to get the target normals. Compared to state-of-the-artworks,

NormalNet can effectively remove noise while preserving the original features and avoiding pseudo-features.

Index Terms —Mesh denoising, convolutional neural networks, normal ﬁltering, guided normal ﬁltering, voxelization (cid:70)

NTRODUCTION R ECENTLY , a demand for high-ﬁdelity 3D mesh modelsof real objects has appeared in many domains, such ascomputer graphics, geometric modelling, computer-aideddesign and the movie industry. However, due to the ac-curacy limitations of scanning devices, raw mesh modelsare inevitably contaminated by noise, leading to corruptedfeatures that profoundly affect the subsequent applicationsof meshes. Hence, mesh denoising has become an activeresearch topic in the area of geometry processing.Mesh denoising is an ill-posed inverse problem. Thenature of mesh denoising is to smooth a noisy surface, whileconcurrently preserving the real object features without in-troducing unnatural geometric distortions. Mesh denoisingis a challenging task, especially for cases with large anddense meshes and with high noise levels. The key to thesuccess of mesh denoising is to differentiate the actualgeometry features, such as the localized curvature changesand small-scale details, and the noise generated by scan-ners. The literature contains rich work on mesh denoising,including ﬁltering-based [1], [2], [3], [4], [5], [6], [7], feature-extraction-based [8], [9], optimization-based [10], [11], andsimilarity-based [12], [13], [14]. Among these methods, theguided-normal based scheme has become popular in re-cent years [3], [4], [5], [6], [7], which follows the iterativeframework of ﬁltering-based mesh denoising. During eachiteration, the guidance normals are derived and used inﬁltering ﬁrst. Then the vertex positions are updated ac-cording to the ﬁltered normals. This approach performsmesh denoising either by building guidance normals withmanually designed methods [3], [6], [7], or by introduc-

W. Zhao, X. Liu, Y. Zhao, X. Fan and D. Zhao are with the School ofComputer Science and Technology, Harbin Institute of Technology, Harbin,150001, China. e-mail: { wbzhao, csxm, 18S103199, fxp, dbzhao } @hit.edu.cn. ing additional information to improve the performance ofguidance normals [4], [5]. The schemes in [3], [4], [5]perform well on synthetic meshes with simple structures.However, the methods of generating guidance normalsin [3], [4], [5] are based on ﬁnding consistent patches withﬁxed shapes, therefore cannot handle complex structureswell such as narrow edges and corners. To overcome thisproblem, Li et al. [6] propose to generate the guidance nor-mals by the corner-aware and edge-aware neighbourhood.Zhao et al. [7] employ the graph-cut to generate piece-wisesmooth patches and build guidance normals on them. Theseschemes perform well on synthetic meshes with complexfeatures. However, the main idea of these schemes is ﬁndingconsistent patches according to the face normal difference,and the structure information has not been fully utilized.For scanned meshes, which contain manifold kinds of noiseand more complex shapes, such as serrated noise (Fig. 11),stair-stepping noise (Fig. 14) and irregular edges (Fig. 10),the face normal difference of noisy faces is so large that it isdifﬁcult for [6], [7] to distinguish noise and features, result-ing in either introducing pseudo-features or over-smooth.As shown in the experimental comparisons, even the state-of-the-art schemes [3], [7] cannot handle these cases well.In the counterpart 2D image denoising, deep-learning-based strategies, such as [15], [16], [17], have been widelyapplied and achieved great success. However, with respectto mesh denoising, to the best of our knowledge, no studiesfollow this line of research. One main reason preventingthe usage of convolutional neural network(CNN) in meshdenoising is that, in contrast to the regular grid structure of2D images, meshes have irregular structures. Therefore, it isnot straightforward to apply the regular 3D convolutionaloperations in CNNs to a mesh. Another reason may bethe difﬁculty of selecting an efﬁcient denoising strategy for a r X i v : . [ c s . G R ] N ov CNN to mimic.In this work, we propose a learning-based normal ﬁl-tering scheme for mesh denoising called

NormalNet , whichmaps the guided normal ﬁltering (GNF) [3] into a deep net-work. In particular,

NormalNet follows the iterative frame-work of GNF, as shown in Fig. 1. During each iteration,to overcome the difﬁculty in using CNNs on meshes andexploit both the structure and face normal information,ﬁrst, the voxelization strategy is applied on each face ina mesh to convert the irregular local structure into theregular volumetric representation. Second, a deep CNNis designed, which takes the volumetric representation asinput, and outputs the learned ﬁltered normals. All CNNsshare the same workﬂow: three residual blocks, one max-pooling layer and four fully connected layers, of which thefourth layer outputs the ﬁltered normals. At last, the vertexpositions are updated according to the ﬁltered normals.Moreover, we propose an iterative training framework for

NormalNet , in which the generation of training data and thetraining of CNNs are performed alternately, whereas theground truth normals are taken as the guidance normalsin GNF to get the target normals. Compared to the state-of-the-art schemes,

NormalNet can effectively generate moreaccurate ﬁltering results and remove noise while preservingthe original features and avoiding pseudo-features.The rest of this paper is organized as follows. In the fol-lowing section, we brieﬂy summarize the related work. Theproposed

NormalNet is introduced in Section 3. In Section 4,the training of

NormalNet is elaborated. Experimental resultsare presented in Section 5. Finally, Section 6 concludes thepaper.

ELATED WORK

In this section, we brieﬂy review the related work on theﬁltering-based mesh denoising and the neural-network-based 3D model processing.

Owing to the edge-preserving property of the bilateral ﬁlter,researchers have made many attempts to adopt the bilateralﬁltering in mesh denoising [18], [19], [20]. Nevertheless,the photometric weights in the bilateral ﬁlter cannot beestimated accurately from a noise-corrupted mesh. The jointbilateral ﬁlter [21], in which the photometric weights arecomputed from a reliable guidance image, is proposed toimprove the capability of bilateral ﬁltering. Inspired bythis idea, Zhang et al. [3] propose the guided normal ﬁl-tering, in which the guidance information is obtained asthe average normal of a local patch. This scheme workswell with respect to feature preservation but cannot achievesatisfactory results in regions with complex shapes andsometimes introduces pseudo-features. To overcome theproblems of [3], in the subsequent work [6], the guidancenormals are computed by the corner-aware neighbourhood,which is adaptive to the shapes of corners and edges.Recently, there have been increasing efforts to exploit thegeometric attributes for mesh denoising. In [22], the normalﬁltering is performed by means of a total variation, whichassumes the normal change is piecewise constant. Wei et al. [2] propose to cluster faces into piecewise-smooth patchesand reﬁne face normals with the help of vertex normalﬁelds. In [10], a differential edge operator is proposed andthe L0 minimization is employed to remove noise whilepreserving the sharp features. Further more, Lu et al. [23]apply an additional vertex ﬁltering before the L1-medianface normal ﬁltering, which proves to be capable of han-dling high noise levels and noise distributed in a randomdirection. However, feature information, such as edges andcorners with less noise, may be blurred due to preﬁltering.In [24], the Tukey bi-weight similarity function is proposedto replace the similarity function in the computation ofbilateral weights; in addition, an edge-weighted Laplaceoperator is introduced for vertex updating to reduce facenormal ﬂips. In [7], the graph-based feature detection isemployed to construct accurate guidance normals; however,this method may introduce pseudo-features when the shapeof the noise is complex, which is common in scannedmodels.

Driven by the great success of deep learning in imageprocessing, researchers in graphics are also attempting toemploy deep neural networks for 3D model processing.However, due to the property of irregular connectivity,processing 3D models with neural networks remains chal-lenging. Numerous works have focused on transforming3D models into regular data. For instance, in [25], [26],3D models are represented by 2D rendered images andpanoramic views. Furthermore, some studies [27], [28], [29],[30] have employed voxelization to transform models intoregular 3D data. Moreover, in [31], [32], [33], [34], meshesare represented in the spectral or spatial domain for furtherprocessing.In addition to these transform-based techniques, the di-rect application of neural networks to irregular data has alsobeen extensively studied for point cloud data. PointNet [35]is one of the ﬁrst network architectures that can handle pointcloud data. Subsequently, PointNet++ [36] and the dynamicgraph CNN [37] are proposed to improve the networkcapability. Some attempts have been made to organize pointclouds into structures. In [38], a kd-tree is constructed ona point cloud and is further used as the input of a neuralnetwork. A similar idea is presented in [39], where thepoints are organized by an octree. Additional works [40],[41], [42] focus on surface reconstruction, denoising andremoving outliers.Notably, in [43], Wang et al. propose the ﬁltered facet nor-mal descriptor and model it with neural networks, however,these networks are not convolutional and only take facenormal information into considered. In [44], the edge-basedconvolution and pooling operations are deﬁned which canbe directly on the constructs of the mesh. HE F RAMEWORK OF N ORMAL N ET In this section, we introduce the framework of

NormalNet .Including four parts: the generation of patch, the introduc-tion of GNF [3], the voxelization strategy, and the proposedscheme.

Fig. 1: The framework of

NormalNet . Modules in all iterations share the same workﬂow: for a face, the irregular local 3Dstructure is converted via the voxelization strategy into the regular volumetric representation, which is then input intoCNN to get the ﬁltered normal. Finally, the vertex positions are updated to obtain the denoised mesh.Fig. 2: Illustration of the proposed voxelization strategy. For a face in a mesh, a 2-ring patch is constructed. Two matricesthat represent rotation and translation are computed for normalization. The irregular 3D structure around this face is splitinto small cubes. A label, which is the average normal of the faces in a cube, is then assigned to the cube.

As in mesh denoising, patch is a commonly used structure,so we describe the generation of r -ring patch ﬁrst. Givena face f i as the center of patch P , an r -ring patch of f i isgenerated by ﬁnding all the faces that share at least onevertex with the faces in P , and adding them into P for r times. Two examples of 1-ring and 2-ring patches are shownin Fig. 4. Since our scheme mimics the framework of GNF [3], webrieﬂy introduce GNF.GNF is an iterative scheme, in which the face normalﬁltering is repeated for N f times. For a face f i , the guidedﬁltering is applied to obtain the denoised normal: n (cid:48) i = e i (cid:88) f j ∈N i a j G d ( c i , c j ) G g ( g i , g j ) n j (1)where c j , n j and g j are the centre, face normal and guidancenormal of f j ; N i is a set of the geometrical neighbouring faces of f i ; e i is a normalization factor used to ensure that n (cid:48) i is a unit vector. G d and G g are the Gaussian kernels [45],which are computed by: G g = exp (cid:32) − | n i − n j | µ g (cid:33) , (2) G d = exp (cid:32) − | c i − c j | µ d (cid:33) (3)where µ d and µ g are the Gaussian function parameters, µ d is usually twice the average distance between adjacentface centres, µ g is usually different for different meshes.Following the idea of [46], after each ﬁltering, the positionupdating of the vertices is repeated for N v times to obtainthe denoised mesh.The guidance normal of f i is generated as follows. Foreach face suppose that P is a 1-ring patch that contains f i .The consistency C ( P ) of P is calculated as [3]: C ( P ) = D ( P ) · R ( P ) (4) Fig. 3: (a). The architecture of the deep network in

NormalNet . (b). The structure of residual blocks i, [2 , ∗ i − ] meansthat the convolution stride is 2 and the channel number is ∗ i − .Fig. 4: Two examples of a 1-ring patch (left) and a 2-ringpatch (right), f i is colored with purple.where D ( P ) is the most signiﬁcant face normal differencein P and calculated by: D ( P ) = max f i ,f j ∈ P | n i − n j | (5)where f i and f j represent a pair of faces within P . R ( P ) represents the saliency of P , which is computed by thesaliency of edges in P : R ( P ) = max e i ∈ P ϕ ( e i ) ε + (cid:80) e i ∈ P ϕ ( e i ) (6) ε is a small positive number to prevent the denominatorfrom being zero, and ϕ ( e i ) is the saliency of an edge e i : ϕ ( e i ) = | n i − n i | (7)where n i and n i are the normals of the incident faces of e i .Finally, the most consistent patch is chosen, and the averagenormal of the patch is regarded as the guidance normal.The guidance normals generated by the above methodhave been proven to be effective on simple structures.However, the calculation of consistency is only based on thedifference between face normals, and the structure informa-tion has not been fully considered. As mentioned before,scanned meshes may contain noise with huge face normaldifference. This method will not be working well for suchcases. Rather than designing a method that works well on thesemeshes manually, we employed CNNs to obtain the learnedﬁltered normals. The key to use CNNs for mesh denoising is the trans-formation of the irregular local structure around a faceinto a regular form such that the structure information ispreserved and the CNN convolution operations are easilyperformed.An illustration of the proposed voxelization strategy isshown in Fig. 2. The normalization is applied to improvethe robustness of the strategy ﬁrst. The normalization pro-cess involves two operations: rotation and translation. Inthis way, all faces are normalized to a similar directionand position. Speciﬁcally, for a face f i , a 2-ring patch isconstructed. The average normal of this patch is n i . We thencompute two matrices: W r , which represents the rotationfrom n i to a speciﬁc angle N t , and W t , which representsthe translation from the face centre c i to (0,0,0). The wholemesh is then rotated and translated by means of W r and W t . Supposing v i is the coordinate of a vertex i in the mesh,the new position of v i after normalization is: v (cid:48) i = W t W r v i (8)After normalization, the space of the local mesh struc-ture around f i is split into regular cubes denoted by { B x,y,z | x, y, z ∈ [ − T s , T s ] } , where T s is the parameter thatdetermines the number of cubes and B , , is located at theorigin. The rest issue is to determine the size of each cube.In our work, the side length L c of the cubes is computed as: L c = d s α c (9)where d s is the average distance between adjacent faces inthe noisy mesh and α c is the parameter that controls the sizeof the cubes. Fig. 5: The framework of generating training sets and training.For each cube, we employ the fast 3D triangle-box over-lap testing strategy [47] to ﬁnd faces that overlap with thiscube. If at least one face is overlaps with this cube, thelabel of this cube is assigned as the average normal of allthe overlapped faces, denoted by B ; otherwise, the labelis set to (0,0,0). In this way, we convert the irregular localmesh structure into the regular volumetric representation V = (cid:104) B ∈ R (2 T s +1) (cid:105) . V is then used as the input of thenetwork.In our experiment, we set T s = 20 , α c = 8 and N t = (0 , , . Under these conditions, V is a 41x41x41x3matrix that contains the most 3-ring structure around f i .Each face is split into about 40 ∼

60 cubes, which is sufﬁ-cient to represent the shape information. Smaller T s and α c reduce the amount of information in V and lead to unsatis-factory results, whereas larger parameters can improve theperformance slightly but greatly increase the training time. The proposed scheme is also an iterative scheme which isrepeated for N f times. During each iteration, for a face f i ina mesh, the voxelization strategy is employed to transformthe irregular local mesh structure around f i into the regularvolumetric representation. Then a CNN takes the volumet-ric representation as input, and outputs the ﬁltered normals.Since the value of µ g in GNF greatly affects the denoisingresults and is often different for different meshes. Thereforethe output of the network contains N ﬁltered normals withdifferent µ g . At last the positions of the vertices are updatedaccording to the selected ﬁltered normals by N v times.The network architecture is shown in Fig. 3. It containsthree residual blocks, a global max-pooling layer and fourfully connected layers. The numbers of channels of theresidual blocks are 64, 128 and 256. All the convolutionlayers use ∗ ∗ ﬁlters except the ﬁrst layer, which uses ∗ ∗ ﬁlters. Down-sampling is performed by a convolutionoperation with a stride of 2 in the ﬁrst layer of each residualblock. The network ends with four fully connected layers:the ﬁrst three have 512, 256 and 128 channels. The fourthaims to predict the three coordinates of N ﬁltered normalsand thus contains ∗ N channels. All layers are equipped CNN i N f ] TABLE 1: The settings of the corresponding iteration num-bers for each CNN i .with batch normalization and ReLU, except the last layer isequipped with Tanh to ensure the output lies in [-1,1].The network architecture is inspired by the philosophyof ResNet [48] and VGGNet [49]. The purpose of NormalNet is to estimate accurate ﬁltered normals from the noisy signal.However, as the network goes deeper, abundant informa-tion beneﬁcial to ﬁltering normals from the input can vanishor ”wash out” by the time it reaches the output layer. Toaddress this problem, we adopt the shortcut connectionfrom ResNet to directly pass the early feature map to thelater layers. This greatly increases the forward ﬂow ofinformation and thus contributes to the prediction of facenormals. In addition, during the backpropagation process,a shortcut path adds an extra component to the gradientscompared to the plain network, which can mitigate the van-ishing gradient problem, thereby accelerating the trainingprocess.In our experiment, we set N = 6 , and theoutput of the CNN contains the ﬁltering results of µ g = 0 . , . , . , . , . , . . ORMAL N ET T RAINING

In this section, we introduce the training of

NormalNet .Including two parts: the iterative training and the trainingdetails.

The process of generating training sets and training isillustrated in Fig. 5. For each CNN i , a speciﬁed trainingdata set T i is generated from a group of meshes named by M i and the corresponding ground truth. T i is composedof numerous training tuples, each of which consists of avolumetric representation and N target normals. For a face f i from a mesh in M i , the volumetric representation is Fig. 6: Illustration of the L2 error results on the model

Twelve ;each colour represents a CNN.

Para

Fandisk Table Joint Twelve Block N f

10 15 5 25 20 N v

20 15 15 10 30 µ g Bunny Angel Iron Pierrot Rocker-arm N f N v µ g Eagle Gargoyle BallJoint Boy01F Boy02F N f N v µ g Cone04V1 Girl02V1 Cone16V2 Girl01V2 - N f

20 15 10 3 - N v

20 20 10 15 - µ g TABLE 2: The settings of N f , N v and µ g .obtained by applying the voxelization strategy on f i . Thetarget normals are obtained by employing GNF, whereas theground truth normals are adopted as the guidance normals. n (cid:48) i = e i (cid:88) f j ∈N c a j G d ( c i , c j ) G g (cid:0) gn i , gn j (cid:1) n j (10)where gn i and gn j are the ground truth normals of f i and f j , The other parameters are the same as deﬁned in Eq.(1).To make the training process balance with respect tovarious features. Suppose the maximum angle difference inthe 2-ring patch of a face is A p . All the faces in M i aredivided into 4 categories and we randomly select the samenumber of faces in each category for training: • v1: A p > ◦ , large edge region. • v2: ◦ < A p ≤ ◦ , small edge region. • v3: ◦ < A p ≤ ◦ , curved region. • v4: A p ≤ ◦ , smooth region.Initially, M is composed of noisy meshes that theirground truth are already known and without any process-ing. When i > , M i will be obtained by applying ﬁlteringon M i − , which is performed by CNN i − , the parameters used in ﬁltering are µ g = 0 . and N v = 20 . The generationof the training data and the network training are alternatelyperformed iteratively. The loss function is deﬁned as the MSE between N out-put normals and the target normals. We use the truncatednormal distribution to initialize the weights and train thenetwork from scratch. For the optimization method, wechoose the Adam algorithm with a mini-batch size of 80,and the parameters for the Adam optimizer are β = 0 . , β = 0 . and (cid:15) = 1 e − , which are the default settingsin TensorFlow. The learning rate starts at 0.0001 and decaysexponentially after 5000 training steps, for which the decayrate is 0.96. Each CNN i is trained individually. A test set thatrandomly selects some faces from test models is built forevaluation. The evaluation metric for the network is deﬁnedas the average angular error over the entire test set. Eachnetwork is trained for 10 epochs, and the average angularerror is 1-3 degrees after 10 epochs. The network with thesmallest error is selected for utilization.In our experiment, we select 45000 faces in each category;thus, the total size of T i is 180000. The training process isexecuted on a computer with an Intel Core i7-7700 CPUand NVIDIA GTX1080, and each epoch is approximately3 hours. Increasing the number of channels, the numberof layers or the size of T i will not substantially improvethe performance of the networks and only multiplies thetraining time. Halving the numbers of channels or the sizeof T i will also halve the training time; however, the averageangular error will increase to 4-6 degrees. XPERIMENTAL R ESULTS

In this section, the extensive experimental results are pre-sented to demonstrate the performance of

NormalNet . We perform the experimental comparisons on 19 test mod-els, including 6 synthetic models:

Joint , Twelve , Bunny , Fan-disk , Table , Block ; 4 scanned models collected from Internetwhere the type of scanner is unknown:

Angel , Iron , Rocke-tarm , Pierrot ; 6 scanned models which have rich features andgenerated by Microsoft Kinect v1, Microsoft Kinect v2 andMicrosoft Kinect v1 via the Kinect-Fusion technique [43], re-spectively:

Core04V1 , Girl02V1 , Core16V2 , Girl01V2 , Boy01F , Boy02F ; and 3 scanned models generated by laser scan-ners [50]:

Eagle , Gorgoyle and

BallJoint . For the syntheticmodels, the noise type in

Fandisk , Table , Bunny and

Block is Gaussian white noise, while that of

Joint and

Twelve isimpulsive noise.We compare

NormalNet with several state-of-the-art al-gorithms in terms of objective and subjective evaluations.The compared algorithms are 1) guided normal ﬁltering(GNF) [3], 2) L0 minimization optimization (L0M) [10], 3)BI-normal ﬁltering (BI) [2], 4) cascaded normal regression(CNR) [43], 5) graph-based normal ﬁltering (GGNF) [7], and6) normal-voting-tensor-based scheme (VT) [50]. The sourcecodes of GNF, L0M, BI, CNR and GGNF are kindly providedby their authors or implemented by a third party, while theauthor of VT provides the input models and their denoisingresults. model Noise Level

NormalNet

Metrics L0M [10] BI [2] GNF [3] CNR [43] GGNF [7]

NormalNetFandisk CS E v (cid:0) × − (cid:1) E a Table CS E v (cid:0) × − (cid:1) E a Joint CS E v (cid:0) × − (cid:1) E a Twelve CS E v (cid:0) × − (cid:1) E a Block CS E v (cid:0) × − (cid:1) E a Bunny CS E v (cid:0) × − (cid:1) E a Boy01F

Scanned CS E v (cid:0) × − (cid:1) E a Boy02F

Scanned CS E v (cid:0) × − (cid:1) E a Cone04V1

Scanned CV E v (cid:0) × − (cid:1) E a Girl02V1

Scanned CV E v (cid:0) × − (cid:1) E a Cone16V2

Scanned CV E v (cid:0) × − (cid:1) E a Girl01V2

Scanned CV E v (cid:0) × − (cid:1) E a Average - - E v E a TABLE 3: Performance comparisons between

NormalNet and the state-of-the-art methods.

As shown in Fig. 6, during the denoising process for mostmeshes, the L2-error decreases rapidly during the ﬁrst threeiterations and decreases slowly after ten iterations. In orderto design a lightweight network, the iteration numbers aredivided into six intervals, each of which corresponds toa speciﬁc CNN i , as listed in Table 1. Thus, the trainingcost decreases by more than 70% at the price of slightlydecreased performance of CNN, the average angular errorwill increase 0.1-0.15 degree.Three N ormalN et , namely, CV , CV and CS , aretrained on Kinect-v1 training set (73 meshes), Kinect-v2training set (73 meshes) and a remake of synthetic trainingset (60 meshes, where some meshes are excluded from thetraining sets for experiments) provided by [43]. The testmodels Cone04V1 , Girl02V1 , Cone16V2 , and

Girl01V2 aredenoised by the corresponding networks CV and CV , andall the other test models are denoised by CS . The settingsof the parameters N f , N v and µ g and the parameters inother schemes refer to the settings used in [3] and [7]. Theparameter settings of N f , N v and µ g are shown in Table 2. Two error metrics [20] are employed to evaluate the objec-tive denoising results of the models which have the groundtruth: • E a : the mean angle square error, which representsthe accuracy of the face normal; • E v : the L2 vertex-based mesh-to-mesh error, whichrepresents the accuracy of a vertex’s position. We compare the objective performance on 12 models.The comparison results of E a and E v are shown in Table 3,where the best results are bolded, NormalNet performs bestfor 10 models on E a and 10 models on E v , which achievesthe best performance with respect to both metrics on mosttest models. CNR achieves the second best average resultson E a , which proves CNR is superior in estimating facenormals. However, GNF and GGNF achieve better averageresults than CNR on E v , which proves that ﬁltering-basedschemes perform better in recovering vertex positions. Nor-malNet achieves the best average results on E a and E v . The subjective performance comparison results of six syn-thetic models are illustrated in Figs. 7, 8 and 9.Fig. 7 presents the denoising results of two modelswith curved surfaces. The zoomed-in view illustrates thatour scheme introduces fewer pseudo-features than otherschemes. In Fig. 8, our scheme achieves similar performanceto that of GGNF in these feature regions. The corner isrecovered well, and the edge is sharp and clean. In

Block , thehighlighted region in the red window has a higher triangu-lation density. Beneﬁting from the voxelization strategy, ourscheme can preserve the structure information well and isthus less sensitive to the sampling irregularity. In Fig. 9, weperform a comparison on synthetic meshes with impulsivenoise. In

Table , both our scheme and GGNF produce thebest feature recovery results. In

Joint , the edge length of ourscheme is closest to the ground truth. (a) (b) (c) (d) (e) (f) (g) (h)

Fig. 7: Illustration of the denoising results on the models

Fandisk and

Bunny ; the zoomed-in view of

Bunny has been rotated.(a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and

NormalNet ; and the groundtruth. (a) (b) (c) (d) (e) (f) (g) (h)

Fig. 8: Illustration of the denoising results on the models

Twelve and

Block . (a) to (h) are the noisy mesh; the results ofL0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and

NormalNet ; and the ground truth. (a) (b) (c) (d) (e) (f) (g) (h)

Fig. 9: Illustration of the denoising results on he tmodels

Joint and

Table . The red line in

Joint is the length of the groundtruth. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and

NormalNet ; and theground truth.

We further provide the comparison results for models withdifferent scanners. As illustrated in Figs. 10 and 11, ourscheme preforms well on models where the type of scanneris unknown. For fair comparison, CNR is also trained onthe synthetic training set. In Fig. 10, for

Iron , both ourscheme and CNR introduce fewer pseudo-features than theother schemes. However, in

Rocketarm and

Angel , CNR over-smooths the edges in the red boxes, whereas our schemestill produces satisfactory results. In Fig. 11, for

Pierrot , theregion in the red box is corrupted by serrated noise. ForGNF and GGNF, accurate guidance normals are difﬁcultto compute under this type of noise. Thus, the denoisingresult is corrupted by pseudo-features. Furthermore, CNRsucceeds in removing the serrated noise but fails to recoverthe edges around the eyes in the red box. Our scheme ﬁndsa balance between introducing pseudo-features and over-smoothing. The codes of L0M and BI could not process thisregion.In Fig. 12, we compare

NormalNet with (VT) [50] on themodels provided by the authors, which are generated bylaser scanners. Our scheme produces better feature recoveryresults than VT on all three models that contain complexstructures, which further veriﬁes the capability of

Normal-Net .In Figs. 13 and 14, the models are generated by MicrosoftKinect V1 and V2 and provided by the author of CNR.However, we do not have sufﬁcient data to train

Normal-Net for the models generated by Microsoft Kinect v1 viathe Kinect-Fusion technique. Therefore, CS is employedto denoise these models. In Fig. 13, our scheme outputssimilar denoising results as CNR and GGNF. In Fig. 14,our scheme achieves the best smoothing result and theother schemes fail to remove noise in Cone04V1 . In

Girl02V1 and

Girl01V2 , both CNR and our scheme avoid introducingpseudo-features. In

Cone16V2 , most schemes achieve similarfeature recovery results.

ONCLUSION

In this paper, we present a learning-based normal ﬁlteringscheme for mesh denoising. The scheme maps the guidednormal ﬁltering into a deep network and follows the it-erative framework of ﬁltering-based scheme. During eachiteration, ﬁrst, to facilitate the 3D convolution operations,the voxelization strategy is applied on each face in a meshto transform the irregular local structure into the regularvolumetric representation. Second, instead of the guidancenormal generation and the guided ﬁltering in GNF, theoutput of voxelization is then input into a CNN to estimateaccurate ﬁltered normals. Finally, the vertex positions areupdated according to the ﬁltered normals. What’s more,the iterative training framework is proposed for effectivelytraining. The experimental results show that our schemeoutperforms state-of-the-art works with respect to both ob-jective and subjective quality metrics and can effectivelyremove noise while preserving the original features andavoiding pseudo-features. R EFERENCES [1] H. Yagou, Y. Ohtake, and A. Belyaev, “Mesh smoothing via meanand median ﬁltering applied to face normals,” in

Geometric Model-ing and Processing , 2002, pp. 124–131.[2] M. Wei, J. Yu, W.-M. Pang, J. Wang, J. Qin, L. Liu, and P.-A.Heng, “Bi-normal ﬁltering for mesh denoising,”

IEEE Transactionson Visualization and Computer Graphics , vol. 21, no. 1, pp. 43–55,2015.[3] W. Zhang, B. Deng, J. Zhang, S. Bouaziz, and L. Liu, “Guidedmesh normal ﬁltering,” in

Computer Graphics Forum , vol. 34, no. 7.Wiley Online Library, 2015, pp. 23–34.[4] W. Zhao, X. Liu, S. Wang, and D. Zhao, “Multi-scale similarityenhanced guided normal ﬁltering,” in

Advances in MultimediaInformation Processing – PCM 2017 . Springer International Pub-lishing, 2018, pp. 645–653.[5] R. Wang, W. Zhao, S. Liu, D. Zhao, and C. Liu, “Feature-preservingmesh denoising based on guided normal ﬁltering,” in

Advances inMultimedia Information Processing – PCM 2017 . Springer Interna-tional Publishing, 2018, pp. 920–927.[6] T. Li, J. Wang, H. Liu, and L.-g. Liu, “Efﬁcient mesh denoising viarobust normal ﬁltering and alternate vertex updating,”

Frontiersof Information Technology and Electronic Engineering , vol. 18, no. 11,pp. 1828–1842, 2017.[7] W. Zhao, X. Liu, S. Wang, X. Fan, and D. Zhao, “Graph-basedfeature-preserving mesh normal ﬁltering,”

IEEE Transactions onVisualization and Computer Graphics , 2019.[8] X. Lu, Z. Deng, and W. Chen, “A robust scheme for feature-preserving mesh denoising,”

IEEE Transactions on Visualization andComputer Graphics , vol. 22, no. 3, pp. 1181–1194, 2016.[9] M. Wei, L. Liang, W.-M. Pang, J. Wang, W. Li, and H. Wu, “Tensorvoting guided mesh denoising,”

IEEE Transactions on AutomationScience and Engineering , vol. 14, no. 2, pp. 931–945, 2017.[10] L. He and S. Schaefer, “Mesh denoising via l0 minimization,”

ACMTransactions on Graphics , vol. 32, no. 4, p. 64, 2013.[11] R. Wang, Z. Yang, L. Liu, J. Deng, and F. Chen, “Decoupling noiseand features via weighted l1-analysis compressed sensing,”

ACMTransactions on Graphics , vol. 33, no. 2, p. 18, 2014.[12] S. Yoshizawa, A. Belyaev, and H.-P. Seidel, “Smoothing by exam-ple: Mesh denoising by averaging with similarity-based weights,”in

IEEE International Conference on Shape Modeling and Applications .IEEE, 2006, pp. 9–9.[13] G. Rosman, A. Dubrovina, and R. Kimmel, “Patch-collaborativespectral point-cloud denoising,” in

Computer Graphics Forum ,vol. 32, no. 8. Wiley Online Library, 2013, pp. 1–12.[14] J. Digne, “Similarity based ﬁltering of point clouds,” in

ComputerVision and Pattern Recognition Workshops . IEEE, 2012, pp. 73–79.[15] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: Aﬂexible framework for fast and effective image restoration,”

IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 39,no. 6, pp. 1256–1272, 2015.[16] K. Zhang, W. Zuo, and L. Zhang, “Ffdnet: Toward a fast andﬂexible solution for cnn based image denoising,”

IEEE Transactionson Image Processing , vol. 27, no. 9, pp. 4608–4622, 2017.[17] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyonda gaussian denoiser: Residual learning of deep cnn for imagedenoising,”

IEEE Transactions on Image Processing , vol. 26, no. 7,pp. 3142–3155, 2017.[18] S. Fleishman, I. Drori, and D. Cohen-Or, “Bilateral mesh denois-ing,” in

ACM Transactions on Graphics , vol. 22, no. 3, 2003, pp.950–953.[19] T. R. Jones, F. Durand, and M. Desbrun, “Non-iterative, feature-preserving mesh smoothing,” in

ACM Transactions on Graphics ,vol. 22, no. 3, 2003, pp. 943–949.[20] Y. Zheng, H. Fu, O. K.-C. Au, and C.-L. Tai, “Bilateral normalﬁltering for mesh denoising,”

IEEE Transactions on Visualization andComputer Graphics , vol. 17, no. 10, pp. 1521–1530, 2011.[21] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, andK. Toyama, “Digital photography with ﬂash and no-ﬂash imagepairs,” in

ACM SIGGRAPH 2004 Papers , 2004, pp. 664–672.[22] H. Zhang, C. Wu, J. Zhang, and J. Deng, “Variational meshdenoising using total variation and piecewise constant functionspace,”

IEEE Transactions on Visualization and Computer Graphics ,vol. 21, no. 7, pp. 873–886, 2015.[23] “Robust mesh denoising via vertex pre-ﬁltering and l1-mediannormal ﬁltering,”

Computer Aided Geometric Design , vol. 54, pp.49 – 60, 2017. (a) (b) (c) (d) (e) (f) (g) Fig. 10: Illustration of the denoising results on the models

Iron , Rocketarm and

Angel , which are generated by unknownscanners. (a) to (g) are the noisy mesh and the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and

NormalNet . (a) (b) (c) (d) (e) Fig. 11: Illustration of the denoising results on the scanned model

Pierrot , which is generated by unknown scanners. (a) to(e) are the noisy mesh and the results of GNF [3], CNR [43], GGNF [7] and

NormalNet . [24] S. K. Yadav, U. Reitebuch, and K. Polthier, “Robust and highﬁdelity mesh denoising,” IEEE Transactions on Visualization andComputer Graphics , vol. 25, no. 6, pp. 2304–2310, 2019.[25] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-viewconvolutional neural networks for 3d shape recognition,” in

IEEEInternational Conference on Computer Vision , Dec 2015, pp. 945–953.[26] B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: Deep panoramicrepresentation for 3-d shape recognition,”

IEEE Signal ProcessingLetters , vol. 22, no. 12, pp. 2339–2343, Dec 2015.[27] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3dshapenets: A deep representation for volumetric shapes,” in

TheIEEE Conference on Computer Vision and Pattern Recognition , June2015.[28] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neuralnetwork for real-time object recognition,” in

IEEE/RSJ InternationalConference on Intelligent Robots and Systems , Sept 2015, pp. 922–928.[29] P. Wang, Y. Liu, Y. Guo, C. Sun, and X. Tong, “O-CNN: octree-based convolutional neural networks for 3d shape analysis,”

CoRR , vol. abs/1712.01537, 2017.[30] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “High- resolution shape completion using deep neural networks forglobal structure and local geometry inference,”

CoRR , vol.abs/1709.07599, 2017.[31] Q. Tan, L. Gao, Y. Lai, J. Yang, and S. Xia, “Mesh-based autoen-coders for localized deformation component analysis,”

CoRR , vol.abs/1709.04304, 2017.[32] B. Davide, M. Jonathan, R. Emanuele, B. M. M., and C. Daniel,“Anisotropic diffusion descriptors,”

Computer Graphics Forum ,2016.[33] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Syn-chronized spectral cnn for 3d shape segmentation,”

CoRR , vol.abs/1612.00606, 2016.[34] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.Solomon, “Dynamic graph cnn for learning on point clouds,”

CoRR , vol. abs/1801.07829, 2018.[35] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learningon point sets for 3d classiﬁcation and segmentation,”

CoRR , vol.abs/1612.00593, 2016.[36] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deephierarchical feature learning on point sets in a metric space,” in (a) (b) (c) Fig. 12: Illustration of the denoising results of the scannedmodels

Eagle , Gargoyle and

BallJoint , which are generated bylaser scanners. (a) to (c) are the noisy mesh and the resultsof VT [3] and

NormalNet . Advances in Neural Information Processing Systems , 2017, pp. 5099–5108.[37] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.Solomon, “Dynamic graph cnn for learning on point clouds,” arXiv preprint arXiv:1801.07829 , 2018.[38] R. Klokov and V. S. Lempitsky, “Escape from cells: Deep kd-networks for the recognition of 3d point cloud models,” in

IEEEInternational Conference on Computer Vision , 2017, pp. 863–872.[39] G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: Learning deep 3drepresentations at high resolutions,”

CoRR , vol. abs/1611.05009,2016.[40] A. Boulch and R. Marlet, “Deep learning for robust normal esti-mation in unstructured point clouds,” in

Computer Graphics Forum ,vol. 35, no. 5. Wiley Online Library, 2016, pp. 281–290.[41] R. Roveri, A. C. ¨Oztireli, I. Pandele, and M. Gross, “Pointpronets:Consolidation of point clouds with convolutional neural net-works,”

Computer Graphics Forum (Proc. Eurographics) , vol. 37, no. 2,2018.[42] M.-J. Rakotosaona, V. La Barbera, P. Guerrero, N. J. Mitra, andM. Ovsjanikov, “Pointcleannet: Learning to denoise and removeoutliers from dense point clouds,”

Computer Graphics Forum , 2019.[43] P.-S. Wang, Y. Liu, and X. Tong, “Mesh denoising via cascadednormal regression,”

ACM Transactions on Graphics (SIGGRAPHAsia) , vol. 35, no. 6, 2016.[44] R. Hanocka, A. Hertz, N. Fish, R. Giryes, S. Fleishman, andD. Cohen-Or, “Meshcnn: A network with an edge,”

CoRR , vol.abs/1809.05910, 2018.[45] C. Tomasi and R. Manduchi, “Bilateral ﬁltering for gray and colorimages,” in

Sixth International Conference on Computer Vision . IEEE,1998, pp. 839–846.[46] X. Sun, P. Rosin, R. Martin, and F. Langbein, “Fast and effectivefeature-preserving mesh denoising,”

IEEE Transactions on Visual-ization and Computer Graphics , vol. 13, no. 5, 2007.[47] T. Akenine-M?llser, “Fast 3d triangle-box overlap testing,”

Journalof Graphics Tools , vol. 6, no. 1, pp. 29–33, 2001.[48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learningfor image recognition,” in

IEEE Conference on Computer Vision andPattern Recognition , 2016, pp. 770–778. [49] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,”

CoRR , vol.abs/1409.1556, 2014.[50] S. K. Yadav, U. Reitebuch, and K. Polthier, “Mesh denoisingbased on normal voting tensor and binary optimization,”

IEEETransactions on Visualization and Computer Graphics , vol. 24, no. 8,pp. 2366–2379, Aug 2018. (a) (b) (c) (d) (e) (f) (g) (h) Fig. 13: Illustration of the denoising results on the models

Boy01F and

Boy02F , which are generated by Microsoft Kinect v1via the Kinect-Fusion technique. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7]and

NormalNet ; and the ground truth. (a) (b) (c) (d) (e) (f) (g) (h) Fig. 14: Illustration of the denoising results on the models

Cone04V1 , Girl02V1 , Cone16V2 and

Girl01V2 , which are generatedby Microsoft Kinect v1 and v2. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7]and