NormalNet: Learning-based Normal Filtering for Mesh Denoising
Wenbo Zhao, Xianming Liu, Yongsen Zhao, Xiaopeng Fan, Debin Zhao
11 NormalNet: Learning-based Normal Filtering forMesh Denoising
Wenbo Zhao, Xianming Liu,
Member, IEEE , Yongsen Zhao, Xiaopeng Fan,
Senior Member, IEEE
DebinZhao
Member, IEEE , Abstract —Mesh denoising is a critical technology in geometry processing that aims to recover high-fidelity 3D mesh models of objectsfrom their noise-corrupted versions. In this work, we propose a learning-based normal filtering scheme for mesh denoising called
NormalNet , which maps the guided normal filtering (GNF) into a deep network. The scheme follows the iterative framework offiltering-based mesh denoising. During each iteration, first, the voxelization strategy is applied on each face in a mesh to transform theirregular local structure into the regular volumetric representation, therefore, both the structure and face normal information arepreserved and the convolution operations in CNN(Convolutional Neural Network) can be easily performed. Second, instead of theguidance normal generation and the guided filtering in GNF, a deep CNN is designed, which takes the volumetric representation asinput, and outputs the learned filtered normals. At last, the vertex positions are updated according to the filtered normals. Specifically,the iterative training framework is proposed, in which the generation of training data and the network training are alternately performed,whereas the ground truth normals are taken as the guidance normals in GNF to get the target normals. Compared to state-of-the-artworks,
NormalNet can effectively remove noise while preserving the original features and avoiding pseudo-features.
Index Terms —Mesh denoising, convolutional neural networks, normal filtering, guided normal filtering, voxelization (cid:70)
NTRODUCTION R ECENTLY , a demand for high-fidelity 3D mesh modelsof real objects has appeared in many domains, such ascomputer graphics, geometric modelling, computer-aideddesign and the movie industry. However, due to the ac-curacy limitations of scanning devices, raw mesh modelsare inevitably contaminated by noise, leading to corruptedfeatures that profoundly affect the subsequent applicationsof meshes. Hence, mesh denoising has become an activeresearch topic in the area of geometry processing.Mesh denoising is an ill-posed inverse problem. Thenature of mesh denoising is to smooth a noisy surface, whileconcurrently preserving the real object features without in-troducing unnatural geometric distortions. Mesh denoisingis a challenging task, especially for cases with large anddense meshes and with high noise levels. The key to thesuccess of mesh denoising is to differentiate the actualgeometry features, such as the localized curvature changesand small-scale details, and the noise generated by scan-ners. The literature contains rich work on mesh denoising,including filtering-based [1], [2], [3], [4], [5], [6], [7], feature-extraction-based [8], [9], optimization-based [10], [11], andsimilarity-based [12], [13], [14]. Among these methods, theguided-normal based scheme has become popular in re-cent years [3], [4], [5], [6], [7], which follows the iterativeframework of filtering-based mesh denoising. During eachiteration, the guidance normals are derived and used infiltering first. Then the vertex positions are updated ac-cording to the filtered normals. This approach performsmesh denoising either by building guidance normals withmanually designed methods [3], [6], [7], or by introduc-
W. Zhao, X. Liu, Y. Zhao, X. Fan and D. Zhao are with the School ofComputer Science and Technology, Harbin Institute of Technology, Harbin,150001, China. e-mail: { wbzhao, csxm, 18S103199, fxp, dbzhao } @hit.edu.cn. ing additional information to improve the performance ofguidance normals [4], [5]. The schemes in [3], [4], [5]perform well on synthetic meshes with simple structures.However, the methods of generating guidance normalsin [3], [4], [5] are based on finding consistent patches withfixed shapes, therefore cannot handle complex structureswell such as narrow edges and corners. To overcome thisproblem, Li et al. [6] propose to generate the guidance nor-mals by the corner-aware and edge-aware neighbourhood.Zhao et al. [7] employ the graph-cut to generate piece-wisesmooth patches and build guidance normals on them. Theseschemes perform well on synthetic meshes with complexfeatures. However, the main idea of these schemes is findingconsistent patches according to the face normal difference,and the structure information has not been fully utilized.For scanned meshes, which contain manifold kinds of noiseand more complex shapes, such as serrated noise (Fig. 11),stair-stepping noise (Fig. 14) and irregular edges (Fig. 10),the face normal difference of noisy faces is so large that it isdifficult for [6], [7] to distinguish noise and features, result-ing in either introducing pseudo-features or over-smooth.As shown in the experimental comparisons, even the state-of-the-art schemes [3], [7] cannot handle these cases well.In the counterpart 2D image denoising, deep-learning-based strategies, such as [15], [16], [17], have been widelyapplied and achieved great success. However, with respectto mesh denoising, to the best of our knowledge, no studiesfollow this line of research. One main reason preventingthe usage of convolutional neural network(CNN) in meshdenoising is that, in contrast to the regular grid structure of2D images, meshes have irregular structures. Therefore, it isnot straightforward to apply the regular 3D convolutionaloperations in CNNs to a mesh. Another reason may bethe difficulty of selecting an efficient denoising strategy for a r X i v : . [ c s . G R ] N ov CNN to mimic.In this work, we propose a learning-based normal fil-tering scheme for mesh denoising called
NormalNet , whichmaps the guided normal filtering (GNF) [3] into a deep net-work. In particular,
NormalNet follows the iterative frame-work of GNF, as shown in Fig. 1. During each iteration,to overcome the difficulty in using CNNs on meshes andexploit both the structure and face normal information,first, the voxelization strategy is applied on each face ina mesh to convert the irregular local structure into theregular volumetric representation. Second, a deep CNNis designed, which takes the volumetric representation asinput, and outputs the learned filtered normals. All CNNsshare the same workflow: three residual blocks, one max-pooling layer and four fully connected layers, of which thefourth layer outputs the filtered normals. At last, the vertexpositions are updated according to the filtered normals.Moreover, we propose an iterative training framework for
NormalNet , in which the generation of training data and thetraining of CNNs are performed alternately, whereas theground truth normals are taken as the guidance normalsin GNF to get the target normals. Compared to the state-of-the-art schemes,
NormalNet can effectively generate moreaccurate filtering results and remove noise while preservingthe original features and avoiding pseudo-features.The rest of this paper is organized as follows. In the fol-lowing section, we briefly summarize the related work. Theproposed
NormalNet is introduced in Section 3. In Section 4,the training of
NormalNet is elaborated. Experimental resultsare presented in Section 5. Finally, Section 6 concludes thepaper.
ELATED WORK
In this section, we briefly review the related work on thefiltering-based mesh denoising and the neural-network-based 3D model processing.
Owing to the edge-preserving property of the bilateral filter,researchers have made many attempts to adopt the bilateralfiltering in mesh denoising [18], [19], [20]. Nevertheless,the photometric weights in the bilateral filter cannot beestimated accurately from a noise-corrupted mesh. The jointbilateral filter [21], in which the photometric weights arecomputed from a reliable guidance image, is proposed toimprove the capability of bilateral filtering. Inspired bythis idea, Zhang et al. [3] propose the guided normal fil-tering, in which the guidance information is obtained asthe average normal of a local patch. This scheme workswell with respect to feature preservation but cannot achievesatisfactory results in regions with complex shapes andsometimes introduces pseudo-features. To overcome theproblems of [3], in the subsequent work [6], the guidancenormals are computed by the corner-aware neighbourhood,which is adaptive to the shapes of corners and edges.Recently, there have been increasing efforts to exploit thegeometric attributes for mesh denoising. In [22], the normalfiltering is performed by means of a total variation, whichassumes the normal change is piecewise constant. Wei et al. [2] propose to cluster faces into piecewise-smooth patchesand refine face normals with the help of vertex normalfields. In [10], a differential edge operator is proposed andthe L0 minimization is employed to remove noise whilepreserving the sharp features. Further more, Lu et al. [23]apply an additional vertex filtering before the L1-medianface normal filtering, which proves to be capable of han-dling high noise levels and noise distributed in a randomdirection. However, feature information, such as edges andcorners with less noise, may be blurred due to prefiltering.In [24], the Tukey bi-weight similarity function is proposedto replace the similarity function in the computation ofbilateral weights; in addition, an edge-weighted Laplaceoperator is introduced for vertex updating to reduce facenormal flips. In [7], the graph-based feature detection isemployed to construct accurate guidance normals; however,this method may introduce pseudo-features when the shapeof the noise is complex, which is common in scannedmodels.
Driven by the great success of deep learning in imageprocessing, researchers in graphics are also attempting toemploy deep neural networks for 3D model processing.However, due to the property of irregular connectivity,processing 3D models with neural networks remains chal-lenging. Numerous works have focused on transforming3D models into regular data. For instance, in [25], [26],3D models are represented by 2D rendered images andpanoramic views. Furthermore, some studies [27], [28], [29],[30] have employed voxelization to transform models intoregular 3D data. Moreover, in [31], [32], [33], [34], meshesare represented in the spectral or spatial domain for furtherprocessing.In addition to these transform-based techniques, the di-rect application of neural networks to irregular data has alsobeen extensively studied for point cloud data. PointNet [35]is one of the first network architectures that can handle pointcloud data. Subsequently, PointNet++ [36] and the dynamicgraph CNN [37] are proposed to improve the networkcapability. Some attempts have been made to organize pointclouds into structures. In [38], a kd-tree is constructed ona point cloud and is further used as the input of a neuralnetwork. A similar idea is presented in [39], where thepoints are organized by an octree. Additional works [40],[41], [42] focus on surface reconstruction, denoising andremoving outliers.Notably, in [43], Wang et al. propose the filtered facet nor-mal descriptor and model it with neural networks, however,these networks are not convolutional and only take facenormal information into considered. In [44], the edge-basedconvolution and pooling operations are defined which canbe directly on the constructs of the mesh. HE F RAMEWORK OF N ORMAL N ET In this section, we introduce the framework of
NormalNet .Including four parts: the generation of patch, the introduc-tion of GNF [3], the voxelization strategy, and the proposedscheme.
Fig. 1: The framework of
NormalNet . Modules in all iterations share the same workflow: for a face, the irregular local 3Dstructure is converted via the voxelization strategy into the regular volumetric representation, which is then input intoCNN to get the filtered normal. Finally, the vertex positions are updated to obtain the denoised mesh.Fig. 2: Illustration of the proposed voxelization strategy. For a face in a mesh, a 2-ring patch is constructed. Two matricesthat represent rotation and translation are computed for normalization. The irregular 3D structure around this face is splitinto small cubes. A label, which is the average normal of the faces in a cube, is then assigned to the cube.
As in mesh denoising, patch is a commonly used structure,so we describe the generation of r -ring patch first. Givena face f i as the center of patch P , an r -ring patch of f i isgenerated by finding all the faces that share at least onevertex with the faces in P , and adding them into P for r times. Two examples of 1-ring and 2-ring patches are shownin Fig. 4. Since our scheme mimics the framework of GNF [3], webriefly introduce GNF.GNF is an iterative scheme, in which the face normalfiltering is repeated for N f times. For a face f i , the guidedfiltering is applied to obtain the denoised normal: n (cid:48) i = e i (cid:88) f j ∈N i a j G d ( c i , c j ) G g ( g i , g j ) n j (1)where c j , n j and g j are the centre, face normal and guidancenormal of f j ; N i is a set of the geometrical neighbouring faces of f i ; e i is a normalization factor used to ensure that n (cid:48) i is a unit vector. G d and G g are the Gaussian kernels [45],which are computed by: G g = exp (cid:32) − | n i − n j | µ g (cid:33) , (2) G d = exp (cid:32) − | c i − c j | µ d (cid:33) (3)where µ d and µ g are the Gaussian function parameters, µ d is usually twice the average distance between adjacentface centres, µ g is usually different for different meshes.Following the idea of [46], after each filtering, the positionupdating of the vertices is repeated for N v times to obtainthe denoised mesh.The guidance normal of f i is generated as follows. Foreach face suppose that P is a 1-ring patch that contains f i .The consistency C ( P ) of P is calculated as [3]: C ( P ) = D ( P ) · R ( P ) (4) Fig. 3: (a). The architecture of the deep network in
NormalNet . (b). The structure of residual blocks i, [2 , ∗ i − ] meansthat the convolution stride is 2 and the channel number is ∗ i − .Fig. 4: Two examples of a 1-ring patch (left) and a 2-ringpatch (right), f i is colored with purple.where D ( P ) is the most significant face normal differencein P and calculated by: D ( P ) = max f i ,f j ∈ P | n i − n j | (5)where f i and f j represent a pair of faces within P . R ( P ) represents the saliency of P , which is computed by thesaliency of edges in P : R ( P ) = max e i ∈ P ϕ ( e i ) ε + (cid:80) e i ∈ P ϕ ( e i ) (6) ε is a small positive number to prevent the denominatorfrom being zero, and ϕ ( e i ) is the saliency of an edge e i : ϕ ( e i ) = | n i − n i | (7)where n i and n i are the normals of the incident faces of e i .Finally, the most consistent patch is chosen, and the averagenormal of the patch is regarded as the guidance normal.The guidance normals generated by the above methodhave been proven to be effective on simple structures.However, the calculation of consistency is only based on thedifference between face normals, and the structure informa-tion has not been fully considered. As mentioned before,scanned meshes may contain noise with huge face normaldifference. This method will not be working well for suchcases. Rather than designing a method that works well on thesemeshes manually, we employed CNNs to obtain the learnedfiltered normals. The key to use CNNs for mesh denoising is the trans-formation of the irregular local structure around a faceinto a regular form such that the structure information ispreserved and the CNN convolution operations are easilyperformed.An illustration of the proposed voxelization strategy isshown in Fig. 2. The normalization is applied to improvethe robustness of the strategy first. The normalization pro-cess involves two operations: rotation and translation. Inthis way, all faces are normalized to a similar directionand position. Specifically, for a face f i , a 2-ring patch isconstructed. The average normal of this patch is n i . We thencompute two matrices: W r , which represents the rotationfrom n i to a specific angle N t , and W t , which representsthe translation from the face centre c i to (0,0,0). The wholemesh is then rotated and translated by means of W r and W t . Supposing v i is the coordinate of a vertex i in the mesh,the new position of v i after normalization is: v (cid:48) i = W t W r v i (8)After normalization, the space of the local mesh struc-ture around f i is split into regular cubes denoted by { B x,y,z | x, y, z ∈ [ − T s , T s ] } , where T s is the parameter thatdetermines the number of cubes and B , , is located at theorigin. The rest issue is to determine the size of each cube.In our work, the side length L c of the cubes is computed as: L c = d s α c (9)where d s is the average distance between adjacent faces inthe noisy mesh and α c is the parameter that controls the sizeof the cubes. Fig. 5: The framework of generating training sets and training.For each cube, we employ the fast 3D triangle-box over-lap testing strategy [47] to find faces that overlap with thiscube. If at least one face is overlaps with this cube, thelabel of this cube is assigned as the average normal of allthe overlapped faces, denoted by B ; otherwise, the labelis set to (0,0,0). In this way, we convert the irregular localmesh structure into the regular volumetric representation V = (cid:104) B ∈ R (2 T s +1) (cid:105) . V is then used as the input of thenetwork.In our experiment, we set T s = 20 , α c = 8 and N t = (0 , , . Under these conditions, V is a 41x41x41x3matrix that contains the most 3-ring structure around f i .Each face is split into about 40 ∼
60 cubes, which is suffi-cient to represent the shape information. Smaller T s and α c reduce the amount of information in V and lead to unsatis-factory results, whereas larger parameters can improve theperformance slightly but greatly increase the training time. The proposed scheme is also an iterative scheme which isrepeated for N f times. During each iteration, for a face f i ina mesh, the voxelization strategy is employed to transformthe irregular local mesh structure around f i into the regularvolumetric representation. Then a CNN takes the volumet-ric representation as input, and outputs the filtered normals.Since the value of µ g in GNF greatly affects the denoisingresults and is often different for different meshes. Thereforethe output of the network contains N filtered normals withdifferent µ g . At last the positions of the vertices are updatedaccording to the selected filtered normals by N v times.The network architecture is shown in Fig. 3. It containsthree residual blocks, a global max-pooling layer and fourfully connected layers. The numbers of channels of theresidual blocks are 64, 128 and 256. All the convolutionlayers use ∗ ∗ filters except the first layer, which uses ∗ ∗ filters. Down-sampling is performed by a convolutionoperation with a stride of 2 in the first layer of each residualblock. The network ends with four fully connected layers:the first three have 512, 256 and 128 channels. The fourthaims to predict the three coordinates of N filtered normalsand thus contains ∗ N channels. All layers are equipped CNN i N f ] TABLE 1: The settings of the corresponding iteration num-bers for each CNN i .with batch normalization and ReLU, except the last layer isequipped with Tanh to ensure the output lies in [-1,1].The network architecture is inspired by the philosophyof ResNet [48] and VGGNet [49]. The purpose of NormalNet is to estimate accurate filtered normals from the noisy signal.However, as the network goes deeper, abundant informa-tion beneficial to filtering normals from the input can vanishor ”wash out” by the time it reaches the output layer. Toaddress this problem, we adopt the shortcut connectionfrom ResNet to directly pass the early feature map to thelater layers. This greatly increases the forward flow ofinformation and thus contributes to the prediction of facenormals. In addition, during the backpropagation process,a shortcut path adds an extra component to the gradientscompared to the plain network, which can mitigate the van-ishing gradient problem, thereby accelerating the trainingprocess.In our experiment, we set N = 6 , and theoutput of the CNN contains the filtering results of µ g = 0 . , . , . , . , . , . . ORMAL N ET T RAINING
In this section, we introduce the training of
NormalNet .Including two parts: the iterative training and the trainingdetails.
The process of generating training sets and training isillustrated in Fig. 5. For each CNN i , a specified trainingdata set T i is generated from a group of meshes named by M i and the corresponding ground truth. T i is composedof numerous training tuples, each of which consists of avolumetric representation and N target normals. For a face f i from a mesh in M i , the volumetric representation is Fig. 6: Illustration of the L2 error results on the model
Twelve ;each colour represents a CNN.
Para
Fandisk Table Joint Twelve Block N f
10 15 5 25 20 N v
20 15 15 10 30 µ g Bunny Angel Iron Pierrot Rocker-arm N f N v µ g Eagle Gargoyle BallJoint Boy01F Boy02F N f N v µ g Cone04V1 Girl02V1 Cone16V2 Girl01V2 - N f
20 15 10 3 - N v
20 20 10 15 - µ g TABLE 2: The settings of N f , N v and µ g .obtained by applying the voxelization strategy on f i . Thetarget normals are obtained by employing GNF, whereas theground truth normals are adopted as the guidance normals. n (cid:48) i = e i (cid:88) f j ∈N c a j G d ( c i , c j ) G g (cid:0) gn i , gn j (cid:1) n j (10)where gn i and gn j are the ground truth normals of f i and f j , The other parameters are the same as defined in Eq.(1).To make the training process balance with respect tovarious features. Suppose the maximum angle difference inthe 2-ring patch of a face is A p . All the faces in M i aredivided into 4 categories and we randomly select the samenumber of faces in each category for training: • v1: A p > ◦ , large edge region. • v2: ◦ < A p ≤ ◦ , small edge region. • v3: ◦ < A p ≤ ◦ , curved region. • v4: A p ≤ ◦ , smooth region.Initially, M is composed of noisy meshes that theirground truth are already known and without any process-ing. When i > , M i will be obtained by applying filteringon M i − , which is performed by CNN i − , the parameters used in filtering are µ g = 0 . and N v = 20 . The generationof the training data and the network training are alternatelyperformed iteratively. The loss function is defined as the MSE between N out-put normals and the target normals. We use the truncatednormal distribution to initialize the weights and train thenetwork from scratch. For the optimization method, wechoose the Adam algorithm with a mini-batch size of 80,and the parameters for the Adam optimizer are β = 0 . , β = 0 . and (cid:15) = 1 e − , which are the default settingsin TensorFlow. The learning rate starts at 0.0001 and decaysexponentially after 5000 training steps, for which the decayrate is 0.96. Each CNN i is trained individually. A test set thatrandomly selects some faces from test models is built forevaluation. The evaluation metric for the network is definedas the average angular error over the entire test set. Eachnetwork is trained for 10 epochs, and the average angularerror is 1-3 degrees after 10 epochs. The network with thesmallest error is selected for utilization.In our experiment, we select 45000 faces in each category;thus, the total size of T i is 180000. The training process isexecuted on a computer with an Intel Core i7-7700 CPUand NVIDIA GTX1080, and each epoch is approximately3 hours. Increasing the number of channels, the numberof layers or the size of T i will not substantially improvethe performance of the networks and only multiplies thetraining time. Halving the numbers of channels or the sizeof T i will also halve the training time; however, the averageangular error will increase to 4-6 degrees. XPERIMENTAL R ESULTS
In this section, the extensive experimental results are pre-sented to demonstrate the performance of
NormalNet . We perform the experimental comparisons on 19 test mod-els, including 6 synthetic models:
Joint , Twelve , Bunny , Fan-disk , Table , Block ; 4 scanned models collected from Internetwhere the type of scanner is unknown:
Angel , Iron , Rocke-tarm , Pierrot ; 6 scanned models which have rich features andgenerated by Microsoft Kinect v1, Microsoft Kinect v2 andMicrosoft Kinect v1 via the Kinect-Fusion technique [43], re-spectively:
Core04V1 , Girl02V1 , Core16V2 , Girl01V2 , Boy01F , Boy02F ; and 3 scanned models generated by laser scan-ners [50]:
Eagle , Gorgoyle and
BallJoint . For the syntheticmodels, the noise type in
Fandisk , Table , Bunny and
Block is Gaussian white noise, while that of
Joint and
Twelve isimpulsive noise.We compare
NormalNet with several state-of-the-art al-gorithms in terms of objective and subjective evaluations.The compared algorithms are 1) guided normal filtering(GNF) [3], 2) L0 minimization optimization (L0M) [10], 3)BI-normal filtering (BI) [2], 4) cascaded normal regression(CNR) [43], 5) graph-based normal filtering (GGNF) [7], and6) normal-voting-tensor-based scheme (VT) [50]. The sourcecodes of GNF, L0M, BI, CNR and GGNF are kindly providedby their authors or implemented by a third party, while theauthor of VT provides the input models and their denoisingresults. model Noise Level
NormalNet
Metrics L0M [10] BI [2] GNF [3] CNR [43] GGNF [7]
NormalNetFandisk CS E v (cid:0) × − (cid:1) E a Table CS E v (cid:0) × − (cid:1) E a Joint CS E v (cid:0) × − (cid:1) E a Twelve CS E v (cid:0) × − (cid:1) E a Block CS E v (cid:0) × − (cid:1) E a Bunny CS E v (cid:0) × − (cid:1) E a Boy01F
Scanned CS E v (cid:0) × − (cid:1) E a Boy02F
Scanned CS E v (cid:0) × − (cid:1) E a Cone04V1
Scanned CV E v (cid:0) × − (cid:1) E a Girl02V1
Scanned CV E v (cid:0) × − (cid:1) E a Cone16V2
Scanned CV E v (cid:0) × − (cid:1) E a Girl01V2
Scanned CV E v (cid:0) × − (cid:1) E a Average - - E v E a TABLE 3: Performance comparisons between
NormalNet and the state-of-the-art methods.
As shown in Fig. 6, during the denoising process for mostmeshes, the L2-error decreases rapidly during the first threeiterations and decreases slowly after ten iterations. In orderto design a lightweight network, the iteration numbers aredivided into six intervals, each of which corresponds toa specific CNN i , as listed in Table 1. Thus, the trainingcost decreases by more than 70% at the price of slightlydecreased performance of CNN, the average angular errorwill increase 0.1-0.15 degree.Three N ormalN et , namely, CV , CV and CS , aretrained on Kinect-v1 training set (73 meshes), Kinect-v2training set (73 meshes) and a remake of synthetic trainingset (60 meshes, where some meshes are excluded from thetraining sets for experiments) provided by [43]. The testmodels Cone04V1 , Girl02V1 , Cone16V2 , and
Girl01V2 aredenoised by the corresponding networks CV and CV , andall the other test models are denoised by CS . The settingsof the parameters N f , N v and µ g and the parameters inother schemes refer to the settings used in [3] and [7]. Theparameter settings of N f , N v and µ g are shown in Table 2. Two error metrics [20] are employed to evaluate the objec-tive denoising results of the models which have the groundtruth: • E a : the mean angle square error, which representsthe accuracy of the face normal; • E v : the L2 vertex-based mesh-to-mesh error, whichrepresents the accuracy of a vertex’s position. We compare the objective performance on 12 models.The comparison results of E a and E v are shown in Table 3,where the best results are bolded, NormalNet performs bestfor 10 models on E a and 10 models on E v , which achievesthe best performance with respect to both metrics on mosttest models. CNR achieves the second best average resultson E a , which proves CNR is superior in estimating facenormals. However, GNF and GGNF achieve better averageresults than CNR on E v , which proves that filtering-basedschemes perform better in recovering vertex positions. Nor-malNet achieves the best average results on E a and E v . The subjective performance comparison results of six syn-thetic models are illustrated in Figs. 7, 8 and 9.Fig. 7 presents the denoising results of two modelswith curved surfaces. The zoomed-in view illustrates thatour scheme introduces fewer pseudo-features than otherschemes. In Fig. 8, our scheme achieves similar performanceto that of GGNF in these feature regions. The corner isrecovered well, and the edge is sharp and clean. In
Block , thehighlighted region in the red window has a higher triangu-lation density. Benefiting from the voxelization strategy, ourscheme can preserve the structure information well and isthus less sensitive to the sampling irregularity. In Fig. 9, weperform a comparison on synthetic meshes with impulsivenoise. In
Table , both our scheme and GGNF produce thebest feature recovery results. In
Joint , the edge length of ourscheme is closest to the ground truth. (a) (b) (c) (d) (e) (f) (g) (h)
Fig. 7: Illustration of the denoising results on the models
Fandisk and
Bunny ; the zoomed-in view of
Bunny has been rotated.(a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and
NormalNet ; and the groundtruth. (a) (b) (c) (d) (e) (f) (g) (h)
Fig. 8: Illustration of the denoising results on the models
Twelve and
Block . (a) to (h) are the noisy mesh; the results ofL0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and
NormalNet ; and the ground truth. (a) (b) (c) (d) (e) (f) (g) (h)
Fig. 9: Illustration of the denoising results on he tmodels
Joint and
Table . The red line in
Joint is the length of the groundtruth. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and
NormalNet ; and theground truth.
We further provide the comparison results for models withdifferent scanners. As illustrated in Figs. 10 and 11, ourscheme preforms well on models where the type of scanneris unknown. For fair comparison, CNR is also trained onthe synthetic training set. In Fig. 10, for
Iron , both ourscheme and CNR introduce fewer pseudo-features than theother schemes. However, in
Rocketarm and
Angel , CNR over-smooths the edges in the red boxes, whereas our schemestill produces satisfactory results. In Fig. 11, for
Pierrot , theregion in the red box is corrupted by serrated noise. ForGNF and GGNF, accurate guidance normals are difficultto compute under this type of noise. Thus, the denoisingresult is corrupted by pseudo-features. Furthermore, CNRsucceeds in removing the serrated noise but fails to recoverthe edges around the eyes in the red box. Our scheme findsa balance between introducing pseudo-features and over-smoothing. The codes of L0M and BI could not process thisregion.In Fig. 12, we compare
NormalNet with (VT) [50] on themodels provided by the authors, which are generated bylaser scanners. Our scheme produces better feature recoveryresults than VT on all three models that contain complexstructures, which further verifies the capability of
Normal-Net .In Figs. 13 and 14, the models are generated by MicrosoftKinect V1 and V2 and provided by the author of CNR.However, we do not have sufficient data to train
Normal-Net for the models generated by Microsoft Kinect v1 viathe Kinect-Fusion technique. Therefore, CS is employedto denoise these models. In Fig. 13, our scheme outputssimilar denoising results as CNR and GGNF. In Fig. 14,our scheme achieves the best smoothing result and theother schemes fail to remove noise in Cone04V1 . In
Girl02V1 and
Girl01V2 , both CNR and our scheme avoid introducingpseudo-features. In
Cone16V2 , most schemes achieve similarfeature recovery results.
ONCLUSION
In this paper, we present a learning-based normal filteringscheme for mesh denoising. The scheme maps the guidednormal filtering into a deep network and follows the it-erative framework of filtering-based scheme. During eachiteration, first, to facilitate the 3D convolution operations,the voxelization strategy is applied on each face in a meshto transform the irregular local structure into the regularvolumetric representation. Second, instead of the guidancenormal generation and the guided filtering in GNF, theoutput of voxelization is then input into a CNN to estimateaccurate filtered normals. Finally, the vertex positions areupdated according to the filtered normals. What’s more,the iterative training framework is proposed for effectivelytraining. The experimental results show that our schemeoutperforms state-of-the-art works with respect to both ob-jective and subjective quality metrics and can effectivelyremove noise while preserving the original features andavoiding pseudo-features. R EFERENCES [1] H. Yagou, Y. Ohtake, and A. Belyaev, “Mesh smoothing via meanand median filtering applied to face normals,” in
Geometric Model-ing and Processing , 2002, pp. 124–131.[2] M. Wei, J. Yu, W.-M. Pang, J. Wang, J. Qin, L. Liu, and P.-A.Heng, “Bi-normal filtering for mesh denoising,”
IEEE Transactionson Visualization and Computer Graphics , vol. 21, no. 1, pp. 43–55,2015.[3] W. Zhang, B. Deng, J. Zhang, S. Bouaziz, and L. Liu, “Guidedmesh normal filtering,” in
Computer Graphics Forum , vol. 34, no. 7.Wiley Online Library, 2015, pp. 23–34.[4] W. Zhao, X. Liu, S. Wang, and D. Zhao, “Multi-scale similarityenhanced guided normal filtering,” in
Advances in MultimediaInformation Processing – PCM 2017 . Springer International Pub-lishing, 2018, pp. 645–653.[5] R. Wang, W. Zhao, S. Liu, D. Zhao, and C. Liu, “Feature-preservingmesh denoising based on guided normal filtering,” in
Advances inMultimedia Information Processing – PCM 2017 . Springer Interna-tional Publishing, 2018, pp. 920–927.[6] T. Li, J. Wang, H. Liu, and L.-g. Liu, “Efficient mesh denoising viarobust normal filtering and alternate vertex updating,”
Frontiersof Information Technology and Electronic Engineering , vol. 18, no. 11,pp. 1828–1842, 2017.[7] W. Zhao, X. Liu, S. Wang, X. Fan, and D. Zhao, “Graph-basedfeature-preserving mesh normal filtering,”
IEEE Transactions onVisualization and Computer Graphics , 2019.[8] X. Lu, Z. Deng, and W. Chen, “A robust scheme for feature-preserving mesh denoising,”
IEEE Transactions on Visualization andComputer Graphics , vol. 22, no. 3, pp. 1181–1194, 2016.[9] M. Wei, L. Liang, W.-M. Pang, J. Wang, W. Li, and H. Wu, “Tensorvoting guided mesh denoising,”
IEEE Transactions on AutomationScience and Engineering , vol. 14, no. 2, pp. 931–945, 2017.[10] L. He and S. Schaefer, “Mesh denoising via l0 minimization,”
ACMTransactions on Graphics , vol. 32, no. 4, p. 64, 2013.[11] R. Wang, Z. Yang, L. Liu, J. Deng, and F. Chen, “Decoupling noiseand features via weighted l1-analysis compressed sensing,”
ACMTransactions on Graphics , vol. 33, no. 2, p. 18, 2014.[12] S. Yoshizawa, A. Belyaev, and H.-P. Seidel, “Smoothing by exam-ple: Mesh denoising by averaging with similarity-based weights,”in
IEEE International Conference on Shape Modeling and Applications .IEEE, 2006, pp. 9–9.[13] G. Rosman, A. Dubrovina, and R. Kimmel, “Patch-collaborativespectral point-cloud denoising,” in
Computer Graphics Forum ,vol. 32, no. 8. Wiley Online Library, 2013, pp. 1–12.[14] J. Digne, “Similarity based filtering of point clouds,” in
ComputerVision and Pattern Recognition Workshops . IEEE, 2012, pp. 73–79.[15] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: Aflexible framework for fast and effective image restoration,”
IEEETransactions on Pattern Analysis and Machine Intelligence , vol. 39,no. 6, pp. 1256–1272, 2015.[16] K. Zhang, W. Zuo, and L. Zhang, “Ffdnet: Toward a fast andflexible solution for cnn based image denoising,”
IEEE Transactionson Image Processing , vol. 27, no. 9, pp. 4608–4622, 2017.[17] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyonda gaussian denoiser: Residual learning of deep cnn for imagedenoising,”
IEEE Transactions on Image Processing , vol. 26, no. 7,pp. 3142–3155, 2017.[18] S. Fleishman, I. Drori, and D. Cohen-Or, “Bilateral mesh denois-ing,” in
ACM Transactions on Graphics , vol. 22, no. 3, 2003, pp.950–953.[19] T. R. Jones, F. Durand, and M. Desbrun, “Non-iterative, feature-preserving mesh smoothing,” in
ACM Transactions on Graphics ,vol. 22, no. 3, 2003, pp. 943–949.[20] Y. Zheng, H. Fu, O. K.-C. Au, and C.-L. Tai, “Bilateral normalfiltering for mesh denoising,”
IEEE Transactions on Visualization andComputer Graphics , vol. 17, no. 10, pp. 1521–1530, 2011.[21] G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, andK. Toyama, “Digital photography with flash and no-flash imagepairs,” in
ACM SIGGRAPH 2004 Papers , 2004, pp. 664–672.[22] H. Zhang, C. Wu, J. Zhang, and J. Deng, “Variational meshdenoising using total variation and piecewise constant functionspace,”
IEEE Transactions on Visualization and Computer Graphics ,vol. 21, no. 7, pp. 873–886, 2015.[23] “Robust mesh denoising via vertex pre-filtering and l1-mediannormal filtering,”
Computer Aided Geometric Design , vol. 54, pp.49 – 60, 2017. (a) (b) (c) (d) (e) (f) (g) Fig. 10: Illustration of the denoising results on the models
Iron , Rocketarm and
Angel , which are generated by unknownscanners. (a) to (g) are the noisy mesh and the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7] and
NormalNet . (a) (b) (c) (d) (e) Fig. 11: Illustration of the denoising results on the scanned model
Pierrot , which is generated by unknown scanners. (a) to(e) are the noisy mesh and the results of GNF [3], CNR [43], GGNF [7] and
NormalNet . [24] S. K. Yadav, U. Reitebuch, and K. Polthier, “Robust and highfidelity mesh denoising,” IEEE Transactions on Visualization andComputer Graphics , vol. 25, no. 6, pp. 2304–2310, 2019.[25] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-viewconvolutional neural networks for 3d shape recognition,” in
IEEEInternational Conference on Computer Vision , Dec 2015, pp. 945–953.[26] B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: Deep panoramicrepresentation for 3-d shape recognition,”
IEEE Signal ProcessingLetters , vol. 22, no. 12, pp. 2339–2343, Dec 2015.[27] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3dshapenets: A deep representation for volumetric shapes,” in
TheIEEE Conference on Computer Vision and Pattern Recognition , June2015.[28] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neuralnetwork for real-time object recognition,” in
IEEE/RSJ InternationalConference on Intelligent Robots and Systems , Sept 2015, pp. 922–928.[29] P. Wang, Y. Liu, Y. Guo, C. Sun, and X. Tong, “O-CNN: octree-based convolutional neural networks for 3d shape analysis,”
CoRR , vol. abs/1712.01537, 2017.[30] X. Han, Z. Li, H. Huang, E. Kalogerakis, and Y. Yu, “High- resolution shape completion using deep neural networks forglobal structure and local geometry inference,”
CoRR , vol.abs/1709.07599, 2017.[31] Q. Tan, L. Gao, Y. Lai, J. Yang, and S. Xia, “Mesh-based autoen-coders for localized deformation component analysis,”
CoRR , vol.abs/1709.04304, 2017.[32] B. Davide, M. Jonathan, R. Emanuele, B. M. M., and C. Daniel,“Anisotropic diffusion descriptors,”
Computer Graphics Forum ,2016.[33] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Syn-chronized spectral cnn for 3d shape segmentation,”
CoRR , vol.abs/1612.00606, 2016.[34] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.Solomon, “Dynamic graph cnn for learning on point clouds,”
CoRR , vol. abs/1801.07829, 2018.[35] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learningon point sets for 3d classification and segmentation,”
CoRR , vol.abs/1612.00593, 2016.[36] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deephierarchical feature learning on point sets in a metric space,” in (a) (b) (c) Fig. 12: Illustration of the denoising results of the scannedmodels
Eagle , Gargoyle and
BallJoint , which are generated bylaser scanners. (a) to (c) are the noisy mesh and the resultsof VT [3] and
NormalNet . Advances in Neural Information Processing Systems , 2017, pp. 5099–5108.[37] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.Solomon, “Dynamic graph cnn for learning on point clouds,” arXiv preprint arXiv:1801.07829 , 2018.[38] R. Klokov and V. S. Lempitsky, “Escape from cells: Deep kd-networks for the recognition of 3d point cloud models,” in
IEEEInternational Conference on Computer Vision , 2017, pp. 863–872.[39] G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: Learning deep 3drepresentations at high resolutions,”
CoRR , vol. abs/1611.05009,2016.[40] A. Boulch and R. Marlet, “Deep learning for robust normal esti-mation in unstructured point clouds,” in
Computer Graphics Forum ,vol. 35, no. 5. Wiley Online Library, 2016, pp. 281–290.[41] R. Roveri, A. C. ¨Oztireli, I. Pandele, and M. Gross, “Pointpronets:Consolidation of point clouds with convolutional neural net-works,”
Computer Graphics Forum (Proc. Eurographics) , vol. 37, no. 2,2018.[42] M.-J. Rakotosaona, V. La Barbera, P. Guerrero, N. J. Mitra, andM. Ovsjanikov, “Pointcleannet: Learning to denoise and removeoutliers from dense point clouds,”
Computer Graphics Forum , 2019.[43] P.-S. Wang, Y. Liu, and X. Tong, “Mesh denoising via cascadednormal regression,”
ACM Transactions on Graphics (SIGGRAPHAsia) , vol. 35, no. 6, 2016.[44] R. Hanocka, A. Hertz, N. Fish, R. Giryes, S. Fleishman, andD. Cohen-Or, “Meshcnn: A network with an edge,”
CoRR , vol.abs/1809.05910, 2018.[45] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and colorimages,” in
Sixth International Conference on Computer Vision . IEEE,1998, pp. 839–846.[46] X. Sun, P. Rosin, R. Martin, and F. Langbein, “Fast and effectivefeature-preserving mesh denoising,”
IEEE Transactions on Visual-ization and Computer Graphics , vol. 13, no. 5, 2007.[47] T. Akenine-M?llser, “Fast 3d triangle-box overlap testing,”
Journalof Graphics Tools , vol. 6, no. 1, pp. 29–33, 2001.[48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learningfor image recognition,” in
IEEE Conference on Computer Vision andPattern Recognition , 2016, pp. 770–778. [49] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,”
CoRR , vol.abs/1409.1556, 2014.[50] S. K. Yadav, U. Reitebuch, and K. Polthier, “Mesh denoisingbased on normal voting tensor and binary optimization,”
IEEETransactions on Visualization and Computer Graphics , vol. 24, no. 8,pp. 2366–2379, Aug 2018. (a) (b) (c) (d) (e) (f) (g) (h) Fig. 13: Illustration of the denoising results on the models
Boy01F and
Boy02F , which are generated by Microsoft Kinect v1via the Kinect-Fusion technique. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7]and
NormalNet ; and the ground truth. (a) (b) (c) (d) (e) (f) (g) (h) Fig. 14: Illustration of the denoising results on the models
Cone04V1 , Girl02V1 , Cone16V2 and
Girl01V2 , which are generatedby Microsoft Kinect v1 and v2. (a) to (h) are the noisy mesh; the results of L0M [10], BI [2], GNF [3], CNR [43], GGNF [7]and