[PDF] OperatorNet: Recovering 3D Shapes From Difference Operators

Abstract

This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices. To this end we introduce a novel neural architecture, called OperatorNet, which takes as input a set of linear operators representing a shape and produces its 3D embedding. We demonstrate that this approach significantly outperforms previous purely geometric methods for the same problem. Furthermore, we introduce a novel functional operator, which encodes the extrinsic or pose-dependent shape information, and thus complements purely intrinsic pose-oblivious operators, such as the classical Laplacian. Coupled with this novel operator, our reconstruction network achieves very high reconstruction accuracy, even in the presence of incomplete information about a shape, given a soft or functional map expressed in a reduced basis. Finally, we demonstrate that the multiplicative functional algebra enjoyed by these operators can be used to synthesize entirely new unseen shapes, in the context of shape interpolation and shape analogy applications.

Full PDF

OOperatorNet: Recovering 3D Shapes From Difference Operators

Ruqi Huang ∗ LIX, Ecole Polytechnique [email protected]

Marie-Julie Rakotosaona ∗ LIX, Ecole Polytechnique [email protected]

Panos AchlioptasStanford University [email protected]

Leonidas GuibasStanford University [email protected]

Maks OvsjanikovLIX, Ecole Polytechnique [email protected]

Abstract

This paper proposes a learning-based framework for re-constructing 3D shapes from functional operators, com-pactly encoded as small-sized matrices. To this end weintroduce a novel neural architecture, called

OperatorNet ,which takes as input a set of linear operators representinga shape and produces its 3D embedding. We demonstratethat this approach signiﬁcantly outperforms previous purelygeometric methods for the same problem. Furthermore, weintroduce a novel functional operator, which encodes the ex-trinsic or pose-dependent shape information, and thus com-plements purely intrinsic pose-oblivious operators, such asthe classical Laplacian. Coupled with this novel operator,our reconstruction network achieves very high reconstruc-tion accuracy, even in the presence of incomplete informa-tion about a shape, given a soft or functional map expressedin a reduced basis. Finally, we demonstrate that the multi-plicative functional algebra enjoyed by these operators canbe used to synthesize entirely new unseen shapes, in the con-text of shape interpolation and shape analogy applications.

1. Introduction

Encoding and reconstructing 3D shapes is a fundamen-tal problem in Computer Graphics, Computer Vision andrelated ﬁelds. Unlike images, which enjoy a canonical rep-resentation, 3D shapes are encoded through a large varietyof representations, such as point clouds, triangle meshes andvolumetric data, to name a few. Perhaps even more impor-tantly, 3D shapes may undergo a diverse set of transforma-tions, ranging from rigid motions to complex non-rigid andarticulated deformations, that impact these representations.The representation issues have become even moreprominent with the recent advent of learning-based tech-niques, leading to a number of solutions for learning di- ∗ denotes equal contribution. Figure 1: Shape interpolation via OperatorNet (top) andPointNet autoencoder (bottom). Our interpolations aremore smooth and less distorted.rectly on geometric 3D data [7]. This is challenging, aspoint clouds and meshes lack the regular grid structure ex-ploited by convolutional architectures. In particular, de-vising representations that are well-adapted for both shapeanalysis and especially shape synthesis remains difﬁcult.For example, several methods for shape interpolation havebeen proposed by designing deep neural networks, includ-ing auto-encoder architectures, and interpolating the latentvectors learned by such networks [35, 1] . Unfortunately, itis not clear if the latent vectors lie in a linear vector space,and thus linear interpolation can lead to unrealistic interme-diate shapes.In this paper, we show that 3D shapes can not only becompactly encoded as linear functional operators, using thepreviously proposed shape difference operators [32], butthat this representation lends itself very naturally to learn-ing, and allows us to recover the 3D shape information, us-ing a novel neural network architecture which we call Op-eratorNet. Our key observations are twofold: ﬁrst we showthat since shape difference operators can be stored as canon-ical matrices, for a given choice of basis, they enable the use1 a r X i v : . [ c s . G R ] A ug f a convolutional neural network architecture for shape re-covery. Second, we demonstrate that the functional algebra that is naturally available on these operators can be used tosynthesize new shapes, in the context of shape interpolationand shape analogy applications. We argue that because thisalgebra is well-justiﬁed theoretically, it also leads to moreaccurate results in practice, compared to commonly usedlinear interpolation in the latent space (see Figure 1).The shape difference operators introduced in [32], haveproved to be a powerful tool in shape analysis, by allowingto characterize each shape in a collection as the “difference”to some base geometry. These difference operators encodeprecise information about how and where each shape differsfrom the base, but also, due to their compact representa-tion as small matrices, enable efﬁcient exploration of global variability within the collection. Inspired by the former per-spectives, purely geometric approaches [5, 10] have beenproposed for shape reconstruction from shape differences.Though theoretically well-justiﬁed, these approaches relyon solving difﬁcult non-linear optimization problems andrequire strong regularization for accurate results, especiallywhen truncated bases are used.Our OperatorNet, on the other hand, leverages the infor-mation encoded at both the pairwise level and the collectionlevel by using the shape collection to guide the reconstruc-tion. It is well-known that related shapes in a collection of-ten concentrate near a low-dimensional manifold in shapespace [33, 19]. In light of this, the shape difference opera-tors can help to both encode the geometry of the individualshapes, but also help to learn the constrained space of real-istic shapes, which is typically ignored by purely geometricapproaches. Finally, they also allow to encode differencesbetween shapes with different discretizations by relying on functional maps, rather than, e.g., pointwise bijections.In addition to demonstrating the representative power ofthe shape differences in a learning framework, we also ex-tend the original formulation in [32], which only involvesintrinsic (i.e., invariant to isometric transformations) shapedifferences, with a novel extrinsic difference operator thatfacilitates pose-dependent embedding recovery. Our for-mulation is both simpler and robuster compared to previ-ous approaches, e.g. [10], and, as we show below, can morenaturally be integrated in a uniﬁed learning framework.To summarize, our contributions are as follows: • We propose a learning-based pipeline to reconstruct3D shapes from a set of difference operators. • We propose a novel formulation of extrinsic shapedifference, which complements the intrinsic operatorsformulated in [32]. • We demonstrate that by applying algebraic operationson shape differences, we can synthesize new operatorsand thus new shapes via OperatorNet, enabling shapemanipulations such as interpolation and analogy.

2. Related Work

Shape Reconstruction

Our work is closely related toshape reconstruction from intrinsic operators, which was re-cently considered in [5, 10] where several advanced, purelygeometric optimization techniques have been proposed thatgive satisfactory results in the presence of full information[5] or under strong regularization [10]. These works havealso laid the theoretical foundation for shape recovery bydemonstrating that shape difference operators, in principle,contain complete information necessary for recovering theshape embedding (e.g. Propositions 2 and 4 in [10]). On theother hand, these methods also highlight the practical chal-lenges of reconstructing a shape without any knowledge ofthe collection or “shape space” that it belongs to. In con-trast, we show that by leveraging such information via alearning-based approach, realistic 3D shapes can be recov-ered efﬁciently from their shape difference representation,and moreover that entirely new shapes can be synthesizedusing the algebraic structure of difference operators, e.g.,for shape interpolation.

Shape Representations for Learning.

Our work is re-lated to the recent techniques aimed at applying deep learn-ing methods to shape analysis. One of the main challengesis deﬁning a meaningful notion on convolution, while en-suring invariance to basic transformations, such as rotationsand translations. Several techniques have been proposedbased on e.g., Geometry Images [34], volumetric [22, 38],point-based [28] and multi-view approaches [29], as wellas, very recently intrinsic techniques that adapt convolutionto curved surfaces [21, 6, 27] (see also [7] for an overview),and even via toric covers [20], among many others.Despite this tremendous progress in the last few years,deﬁning a shape representation that is compact, lends itselfnaturally to learning, while being invariant to the desiredclass of transformations (e.g., rigid motions) and not lim-ited to a particular topology, remains a challenge. As weshow below, our representation is well-suited for learningapplications, and especially for encoding and recovering ge-ometric structure information. We note that a recent workthat is closely related to ours is the characteristic shape dif-ferences proposed in [14]. That work is primarily focusedon analyzing shape collections, rather than on shape synthe-sis that we target.

Shape Space

Exploring the structure of shape spaces hasa long and extensive research history. Classical PCA-basedmodels, e.g. [2, 13], and more recent shape space models,adapted to speciﬁc shape classes such as humans [19] oranimals [39], or parametric model collections [33], all typi-cally leverage the fact that the space of “realistic” shapes issigniﬁcantly smaller than the space of all possible embed-dings. This has also recently been exploited in the contextf learning-based shape synthesis applications for shapecompletion [17], interpolation [3] and point cloud recon-struction [1] among others. These techniques heavily lever-age the recent proliferation of large data collections suchas DFAUST [4] and Shapenet [8] to name a few. At thesame time, it is not clear if, for example, the commonlyused linear interpolation of latent vectors is well-justiﬁed,leading to unrealistic synthesized shapes. Instead, the shapedifference operators that we use satisfy a well-founded mul-tiplicative algebra, which, as we show below, can be used tocreate realistic synthetic shapes.

3. Preliminaries and Notations

Discretization of Shapes

Throughout this paper, we as-sume that a shape is given as a triangle mesh ( V , F ) ,where V = { v , v , · · · , v n } is the vertex set, and F = { ( v i , v j , v k ) | v i , v j , v k ∈ V} is the set of faces encoding theconnectivity information. Laplace-Beltrami Operator

To each shape S , we asso-ciate a discretized Laplace-Beltrami operator, L := A − W ,using the standard cotangent weight scheme [23, 26],where W is the cotangent weight (stiffness) matrix, and A is the diagonal lumped area (mass) matrix. Furthermore,we denote by Λ , Φ , respectively the diagonal matrix con-taining the k smallest eigenvalues and the correspondingeigenvectors of S , such that W Φ = A ΦΛ . In particular,the eigenvalues stored in Λ are non-negative and can be or-dered as λ ≤ λ ≤ · · · . The columns of Φ are sortedaccordingly, and are orthonormal with respect to the areamatrix, i.e., Φ T A Φ = I k × k , the k × k identity matrix. Itis well-known that Laplace-Beltrami eigenbasis provides amulti-scale encoding of a shape [16], and allows to approx-imate the space of functions via a subspace spanned by theﬁrst few eigenvectors of Φ . Functional Maps

The functional map framework was in-troduced in [24] primarily as an alternative representation ofmaps across shapes. In our context, given two shapes S , S and a point-wise map T from S to S , we can express thefunctional map C from S to S , as follows: C = Φ T A Π Φ . (1)Here, A is the area matrix of S , and Π is a binary ma-trix satisfying Π ( p, q ) = 1 if T ( p ) = q and otherwise.Note that C is a k × k matrix, where k , k is the num-ber of basis functions chosen on S and S . This matrixallows to transport functions as follows: if f is a functionon S expressed as a vector of coefﬁcients a , s.t. f = Φ a ,then C a is the vector of coefﬁcients of the correspondingfunction on S , expressed in the basis of Φ .In general, not every functional map matrix arises from apoint-wise map, and the former might include, for example, soft correspondences, which map a point to a probabilitydensity function. All of the tools that we develop below canaccommodate such general maps. This is a key advantageof our approach, as it does not rely on all shapes having thesame number of points, and only requires the knowledgeof functional map matrices, which can be computed usingexisting techniques [25, 18]. Intrinsic Shape Difference Operators

Finally, to repre-sent shapes themselves, we use the notion of shape differ-ence operators proposed in [32]. Within our setting, theycan be summarized as follows: given a base shape S , anarbitrary shape S i and a functional map C i between them,let K (resp. K i ) be a positive semi-deﬁnite matrix, whichdeﬁnes some inner product for functions on S (resp. S i )expressed in the corresponding bases. Thus, for a pair offunctions f, g on S expressed as vectors of coefﬁcients a , b , we have < f, g > = a T K b . Note that these two inner products K , K i are not com-parable, since they are expressed in different bases. Fortu-nately, the functional map C i plays a role of basis synchro-nizer. Thus, a shape difference operator, which captures thedifference between S and S i is given simply as: D K i = K +0 ( C T i K i C i ) , (2)where + is the Moore-Penrose pseudo-inverse.The original work [32] considered two intrinsic innerproducts, which using the notation above, can be expressedas: K L = Id , and K H = Λ .These inner products, in turn lead to the following shapedifferences operators:Area-based ( L ): D A i = C T i C i , (3)Conformal ( H ): D C i =Λ +0 C T i Λ i C i , (4)These shape difference operators have several key prop-erties. First, they allow to represent an arbitrary shape S i , asa pair of matrices of size k × k , independent of the num-ber of points, by requiring only a functional map betweenthe base shape S and S i . Thus, the size of this represen-tation can be controlled by choosing an appropriate valueof k which allows to gain multi-scale information aboutthe geometry of S i , from the point of view of S . Second,and perhaps more importantly, these matrices are invariantto rigid (and indeed any intrinsic isometry) transformationof S or S i . Finally, previous works [10] have shown thatshape differences in principle contain complete informationabout the intrinsic geometry of a shape. As we show belowthese properties naturally enable the use of learning appli-cations for shape recovery. Functoriality of Shape Differences

Another useful prop-erty of the shape difference operators is functoriality , shownigure 2: Illustration of shape analogy.in [32], and which we exploit in our shape synthesis appli-cations in Section 7. Given shape differences D i , D j ofshapes S i and S j with respect to a base shape S , functorial-ity allows to compute the difference D ij , without functionalmaps between S i and S j . Namely (see Prop. 4.2.4 in [9]): D ij = C i D +0 i D j C − i (5)Intuitively, this means that shape differences naturally sat-isfy the multiplicative algebra : D i D ij = D j , up to achange of basis ensured by C i .This property can be used for shape analogies : givenshapes S A , S B and S C , ﬁnd S X such that S X relates to S C in the same way as S B relates to S A (see the illustrationin Figure 2). This can be solved by looking for a shape X that satisﬁes: C +0 C D CX C C = C +0 A D AB C A . In ourapplication, we ﬁrst create an appropriate D X and then useour network to synthesize the corresponding shape.Finally, the multiplicative property also suggests a wayof interpolation in the space of shape differences. Namely,rather than using basic linear interpolation between D i and D j , we interpolate on the Lie algebra of the Lie group ofshape differences, using the exponential map and its inverse,which leads to: D ( t ) = exp((1 − t ) log( D i )+ t log( D j )) , t ∈ [0 , . (6)Here exp and log are matrix exponential and logarithm re-spectively. Note that, around identity, the linearization pro-vided by the Lie algebra is exact, and we have observed itto produce very accurate results in general.

4. Extrinsic Shape Difference

In our (discrete) setting, with purely intrinsic informa-tion one at the best can determine the edge lengths ofthe mesh. Recovering the shape from its edge lengths,while possible in certain simple scenarios, nevertheless of-ten leads to ambiguities, as highlighted in [10]. To alleviatesuch ambiguities, we propose to augment the existing intrin-sic shape differences with a novel extrinsic shape differenceoperator, and in turn boosts our reconstruction.One basic approach to combine extrinsic informationwith the multi-scale Laplace-Beltrami basis is to projectthe 3D coordinate functions onto the basis, to obtain threevectors of coefﬁcients (one for each x, y, z coordinates): Figure 3: From left to right: original shape with 1000 ver-tices, the recovered embedding from G encoded in the lead-ing k = 10, 60, 100 and 300 eigenbasis of the original shape. f = Φ + X , where X is the n V × matrix of vertex coordi-nates [16, 15]. Unfortunately representing a shape through f , though being multi-scale and compact, is not rotationallyinvariant, and does not provide information about intrinsicgeometry. For example, interpolation of coordinate vectorscan easily lead to loss of shape area.Another option, which is more compatible with our ap-proach and is rotationally invariant, is to encode the innerproducts of coordinate functions on each shape using theGram matrix G = XX T . Expressing G in the correspond-ing basis, and using Eq. (2) gives rise to a shape difference-like representation of the coordinates. Indeed, the followingtheorem (see proof in Appendix A) guarantees that the re-sulting representation contains the same information, up torotational invariance, as simply projecting the coordinatesonto the basis. Theorem 1.

Let G = Φ T AXX T A Φ be the extrinsic innerproduct encoded in Φ , then one can recover the projectionsof the coordinate functions, X , on the subspace spanned by Φ from G , up to a rigid transformation. In particular, when Φ is a complete full basis, the recovery of X is exact. As an illustration of Theorem 1, we show in Figure 3 theembeddings recovered from G when the number of basisfunctions in Φ ranges from 10 to 300.However, the rank of the Gram matrix G of a shape isat most , meaning that the majority of its eigenvalues arezero. This turns out to be an issue in applications, wheregaining information about the local geometry of the shape isimportant, for example in our shape analogies experiments.To compensate for this rank deﬁciency, we make the ex-trinsic inner product Laplacian-like: E D ( i, j ) = (cid:40) − E ( i, j ) if i (cid:54) = j , (cid:80) i (cid:54) = j E ( i, j ) i = j. (7)Where E ( i, j ) is (cid:107) v i − v j (cid:107) A ( i, i ) A ( j, j ) , i.e., the squaredEuclidean distance between points v i , v j on the shape,weighted by the respective vertex area measures. Since E D can be regarded as the Laplacian of a complete graph, allbut one of its eigenvalues are strictly positive.igure 4: A pair of shapes are compared. The most area(resp. extrinsic) distorted region is captured by the lead-ing eigenfunction of the area-based (resp. extrinsic) shapedifference.It is worth noting that the Gram matrix and the squaredEuclidean distance matrix are closely related and can be re-covered from each other as is commonly done in the Multi-Dimensional Scaling literature [11].To summarize, given a base shape S , another shape S i and a functional map C i we encode the extrinsic informa-tion of S i from the point of view of S as follows: D Ei = ( Φ T E D Φ ) + ( C T i Φ Ti E Di Φ i C i ) . (8)In Figure 4, we compute D A and D E of the target shapewith respect to the base, and color code their respectiveeigenfunctions associated with the largest eigenvalue on theshapes to the right. As argued in [32] these functions cap-ture the areas of highest distortion between the shapes, withrespect to the corresponding inner products. Note that theeigenfunction of D A captures the armpit where the localarea shrinks signiﬁcantly, while that of D E captures thehand, where the pose changes are evident.Note that in [10], the authors also propose a shape differ-ence formulation for encoding extrinsic information, whichis deﬁned on the shape offset using the surface normal in-formation. However, their construction can lead to insta-bilities, and moreover, it only gives information about local distances, making it hard to recover large changes in pose.

5. Network Details

Problem Setup

Our general goal is to develop a neuralnetwork capable of recovering the coordinates of a shape,given its representation as a set of shape difference matrices.We therefore aim to solve the same problem considered in[5, 10]. However, unlike these purely geometric methods,we also leverage a collection of training shapes to learn andconstrain the reconstruction to the space of realistic shapes.Thus, we assume that we are given a collection of shapes,each represented by a set of shape difference operators withrespect to a ﬁxed base shape. We also assume the pres-ence of a point-wise map from the base shape to each of

Input shape di ﬀ . Coord. function Figure 5: OperatorNet architecture. The inputs of the net-work are shape difference matrices considered as channels.It outputs the coordinate functions of the shape. The ﬁrstpart (left) of the network consists of a convolutional encoderwhile the second part (right) is a fully-connected decoderbuilt with dense layers.the shapes in the collection, which allows us to compute the“ground truth” embedding of each shape. We represent thisembedding as three coordinate functions on the shape. Ourgoal then is to design a network, capable of converting theinput shape difference operators to the ground truth coordi-nate functions.At test time, we use this network to reconstruct a targetshape given only the shape difference operators with respectto the base shape. Importantly, these shape difference oper-ators only require the knowledge of a functional map fromthe base shape, and can thus arise from shapes with differ-ent discretizations, or can be synthesized directly for shapeanalogies or interpolations applications.

Architecture

To solve the problem above we developedthe OperatorNet architecture, which takes as input shapedifference matrices and outputs coordinate functions. Ournetwork has two modules: a shallow convolutional encoderand a 3 layer dense decoder as shown in Figure 5.The grid structure of shape differences is exploited bythe encoder through the use of convolutions. Note howeverthat translation invariance does not apply to these matrices.After comparing multiple depths of encoders, we selecta shallow version as it performs the best in practice, imply-ing that the shape difference representation already encodesmeaningful information efﬁciently. Moreover, as shown in[10] the edge lengths of a mesh can be recovered from in-trinsic shape differences through a series of least squaresproblems, hinting that increasing the depth of the networkand thus the non-linearity might not be necessary with shapedifferences.On the other hand, the decoder is selected for its abil-ity to transform the latent representation to coordinate func-tions for reconstruction and synthesis tasks.

Datasets

We train OperatorNet on two types of datasets:humans and animals. For human shapes, our traininget consists of 9440 shapes sampled from the DFAUSTdataset [4] and 8000 from the SURREAL dataset [37],which is generated with the model proposed in [19]. TheDFAUST dataset contains scan of human characters subjectto a various of motions. On the other hand, the SURREALdataset injects more variability to the body types.For animals, we use the parametric model proposed inSMAL [39] to generate 1800 animals of 3 different species– lions, dogs, and horses. The meshes of the humans (resp.animals) are simpliﬁed to 1000 vertices (resp. 1769 ver-tices).

Input Shape Differences

We construct the input shapedifferences using a truncated eigenbasis of dimension on the base shape, and the full basis on the target one, inall experiments, regardless of the number of vertices on theshapes. The functional maps from the base to the targets areinduced by the identity maps, since our training shapes arein 1-1 correspondence. This implies that each of the shapesis represented by three × matrices, representing thearea-based, conformal and extrinsic shape differences re-spectively. The independence among the shape differencesallows ﬂexibility in selecting the combination of input shapedifferences, in Section 6 we compare the performance ofseveral combinations, and present a more detailed ablationstudy in Appendix B.It is worth noting that recent learning-based shapematching techniques enable efﬁcient (functional) maps es-timation. In particular, we use the unsupervised matchingmethod of [31] and evaluate OperatorNet trained with com-puted shape differences in Section 6. Loss Function

OperatorNet reconstructs coordinate func-tions of a given training shape. Our shape reconstructionloss operates in two steps. First, we estimate the optimalrigid transformation to align the ground truth point cloud X gt and the reconstructed point cloud X recon using theKabsch algorithm [36] with ground truth correspondences.Secondly, we estimate the mean squared error between thealigned reconstruction and the ground truth. L ( X gt , X recon ) = 1 n V n V (cid:88) i =1 (cid:107) R ( X i recon ) − X i gt (cid:107) . (9)Here R is the function that computes the optimal transfor-mation between X recon and X gt. We align the computedreconstruction to the ground truth embedding, so that thequality of the reconstructed point cloud is invariant to rigidtransformations. This is important since the shape differ-ence operators are invariant to rigid motion of the shape, andthus the network should not be penalized, for not recoveringthe correct orientation. On the other hand, this loss functionis differentiable, since we use a closed-form expression of R X gt, given by the SVD, which enables back-propagationin neural network training. Figure 6: Qualitative comparison of our method and thebaselines.

6. Evaluation

In this section, we provide both qualitative and quantita-tive evaluations of the results from OperatorNet, and com-pare them to the geometric baselines.

Evaluation Metrics

We denote by S gt and S recon theground-truth and the reconstructed meshes respectively. Welet d R = L ( X gt , X recon ) , where L is the rotationally-invariant distance deﬁned in Eq. (9) and X is the vertexset of S . Since OperatorNet is trained with the loss de-ﬁned in Eq. (9), we introduce the following new metrics fora comprehensive, unbiased evaluation and comparison: (1) d V = | V ( S gt ) − V ( S recon ) | /V ( S gt ) , i.e., the relative er-ror of mesh volumes; (3) d E = mean ( i,j ) | l gt ij − l recon ij | /l gt ij ,where l ij is the length of edge ( i, j ) . Baselines

Two major baselines are considered: (1) the in-trinsic reconstruction method from [5], in which we evalu-ate with the ‘Shape-from-Laplacian’ option and use the fullbasis in both the base shape and the target shape; (2) thereconstruction method from [10], where the authors con-struct offset surfaces that also capture extrinsic geometry.Moreover, this method also provides a purely intrinsic re-construction version. We evaluate both cases with the samebasis truncation as our input. Beyond that, we also considerthe nearest neighbor retrieval from the training set with re-spect to distances between shape difference matrices.

Test Data

We use 800 shapes from the DFAUST datasetas the test set, which contains 10 sub-collections (character+ action sequence, each consisting of 80 shapes) that aresolated from the training/validation set. For the efﬁciencyof baseline evaluation, we further sample 5 shapes via fur-thest point sampling regarding the pair-wise Hausdorff dis-tance from each of the sub-collection, resulting in a set of50 shapes that covers signiﬁcant variability in both stylesand poses in the test set.

Qualitative Results

We demonstrate the reconstructedshapes from OperatorNet and the aforementioned baselinesin Figure 6, where the red shape in each row is the groundtruth target shape. The base shape in this experiment (alsothe base shape we compute shape differences on) is shownin Figure 4, which is in the rest pose. The geometricbaselines in general perform worse under signiﬁcant posechanges from the base (see the top two rows in Figure 6),but give relatively more stable results when the differenceis mainly in the shape style (see the bottom row).Our method, on the other hand, produces consistentlygood reconstructions in all cases. Note also that, as ex-pected, OperatorNet using all types of shape differencesgives both the best quantitative and qualitative results. Weprovide more reconstruction examples in Appendix C high-lighting the generalization power of our method. Quantitative Results

We report all the quantitative met-rics deﬁned above in Table 1. First, we observe that Oper-atorNet using both intrinsic and extrinsic shape differencesachieves the lowest reconstruction error, while the purelyextrinsic version is the second best. Secondly, Operator-Net trained on shape differences from computed functionalmaps achieves competing performances, showing that ourmethod is efﬁcient even in the absence of ground truth bi-jective correspondences. Lastly, all the versions of Opera-torNet signiﬁcantly outperform the baselines.Regarding the volume and edge recovery accuracy, eithercomplete or intrinsic-only versions of OperatorNet achievesecond to the best result. We remark that since the near-est neighbor search in general retrieves the right body type,therefore the volume is well-recovered. On the other hand,since the full

Laplacian is provided as input for the Shape-from-Laplacian baseline, it is expected to preserve intrinsicinformation.

Reconstructions of Shapes with Different Discretiza-tions

Lastly, we show that our approach is capable of en-coding differences between shapes with different discretiza-tions. In Figure 7, we compute the functional maps from theﬁne meshes (top row, with 5k vertices) by projecting themto a lower resolution base mesh with 1k vertices. We thenreconstruct them with OperatorNet trained on lower reso-lution shapes. This, on the other hand, is extremely difﬁ-cult for purely geometric methods. In Appendix C we pro-vide examples of reconstructions in the same setting usingthe method of [10], and reconstructions with OperatorNettrained with shapes having 2k vertices. Table 1: Quantitative evaluation of shape reconstruction( d R is at the scale of − ). d R d V d E Op.Net (Int+Ext) . . FuncChar [10](Int) 65.1 0.356 0.118FuncChar [10] (Int+Ext) 28.4 0.028 0.110NN 25.5 .

7. Applications

In this section, we present all of our results using Opera-torNet trained with all types of shape differences. Shape Interpolation

Given two shapes, we ﬁrst interpo-late their shape differences using the formulation in Eq.(8),and then synthesize intermediate shapes by inferring the in-terpolated shape differences with OperatorNet.We compare our method against nearest neighbor re-trieval and PointNet autoencoder. PointNet autoencoder istrained with the encoder architecture from [28] and with ourdecoder. Two versions of PointNet are trained: one autoen-coder with spatial transformers and one without. Since theautoencoder without spatial transformers performs better inour experiments, we select it for the comparisons. Nearestneighbor interpolation retrieves the nearest neighbor of theinterpolated shape differences in the training set and usesthe corresponding embedding. As expected, (see the sec-ond row of Figure 9), nearest neighbor interpolation is lesscontinuous.As shown in Figure 1, our method produces smooth in-terpolations, without signiﬁcant local area distortions com-pared to PointNet. Similarly, in Figure 9, we observe thatthe interpolation via PointNet suffers from local distortionon the arms. In contrast, interpolation using OperatorNetigure 8: Shape interpolation from a tiger (left) to a horse (right) using OperatorNet trained on animals dataset.Figure 9: Shape interpolation between two humans. Notethat PointNet autoencoder produces shapes with local areadistortion, while the interpolation from nearest neighbor(NN) retrieval is not continuous.is continuous and respects the structure and constraints ofthe body, suggesting that shape differences efﬁciently en-code the shape structure. We provide further comparisonsto other baselines including [30, 3, 12] and to linear inter-polation of shape differences in Appendix E.We also train OperatorNet on the animals dataset as de-scribed in Section 5 and show in Figure 8 an interpolationfrom a tiger to a horse.

Shape Analogy

Our second application is to construct se-mantically meaningful new shapes based on shape analo-gies. Given shapes S A , S B , S C , our goal is to construct anew shape S X , such that S C relates to S X as S A to S B .Following the discussion in Section 3, the functorialityof shape differences allows an explicit and mathematicallymeaningful way of constructing the shape difference of S X ,given that of S A , S B and S C . Namely, D X = D C D + A D B . Then, with our OperatorNet, we reconstruct the embeddingof the unknown S X by feeding D X to the network.We compare our results to that of the PointNet autoen-coder. In the latter, we reconstruct S X by decoding the la-tent code obtained by l X = l C − l A + l B , where l A is the Figure 10: Transferring gender via shape analogies: S A and S B are a ﬁxed pair of human shapes with similar posesand styles, but of different genders. We generate S X , whichis supposed to be a “female” version of the varying S C . Ouranalogies are semantically meaningful, while PointNet canproduce suboptimal results (see the red dotted boxes for thediscrepancies).latent code of shape S A (and similarly for S B , S C ).In Figure 10, we show a set of shape analogies obtainedvia OperatorNet and PointNet autoencoder. It is evident thatour results are both more natural and intuitive. We also referthe readers to Appendix D for more examples of analogies.

8. Conclusion & Future Work

In this paper we have introduced a novel learning-basedtechnique for recovering shapes from their difference oper-ators. Our key observation is that shape differences, storedas compact matrices lend themselves naturally to learningand allow to both recover the underlying shape space in acollection and encode the geometry of individual shapes.We also introduce a novel extrinsic shape difference oper-ator and show its utility for shape reconstruction and otherapplications such as shape interpolation and analogies.Currently our approach is only well-adapted to shapesrepresented as triangle meshes. Thus, in the future we planto extend this framework to both learn the optimal innerproducts from data, and adapt our pipeline to other shaperepresentations, such as point clouds or triangle soups. cknowledgements

Parts of this work were sup-ported by the ERC Starting Grant StG-2017-758800(EXPROTEA), KAUST OSR Award CRG-2017-3426a gift from the Nvidia Corporation, a Vannevar BushFaculty Fellowship, NSF grant DMS-1546206, aGoogle Research award and gifts from Adobe andAutodesk. The authors thank Davide Boscaini andEtienne Corman for their help with baseline compar-isons.

References [1] Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, andLeonidas Guibas. Learning representations and gen-erative models for 3d point clouds. arXiv preprintarXiv:1707.02392 , 2017. 1, 3[2] Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Se-bastian Thrun, Jim Rodgers, and James Davis. SCAPE:Shape Completion and Animation of People. In

ACM Trans-actions on Graphics (TOG) , volume 24, pages 408–416.ACM, 2005. 2[3] Heli Ben-Hamu, Haggai Maron, Itay Kezurer, Gal Avineri,and Yaron Lipman. Multi-chart generative surface modeling.In

Proc. SIGGRAPH Asia , page 215. ACM, 2018. 3, 8, 12,13[4] Federica Bogo, Javier Romero, Gerard Pons-Moll, andMichael J. Black. Dynamic FAUST: Registering human bod-ies in motion. In

CVPR , July 2017. 3, 6[5] Davide Boscaini, Davide Eynard, Drosos Kourounis, andMichael M Bronstein. Shape-from-operator: Recoveringshapes from intrinsic operators. In

Computer GraphicsForum , volume 34, pages 265–274. Wiley Online Library,2015. 2, 5, 6, 7[6] Davide Boscaini, Jonathan Masci, Emanuele Rodol`a, andMichael Bronstein. Learning shape correspondence withanisotropic convolutional neural networks. In

Advances inNeural Information Processing Systems , pages 3189–3197,2016. 2[7] Michael M Bronstein, Joan Bruna, Yann LeCun, ArthurSzlam, and Pierre Vandergheynst. Geometric deep learning:going beyond euclidean data.

IEEE Signal Processing Mag-azine , 34(4):18–42, 2017. 1, 2[8] Angel X. Chang, Thomas A. Funkhouser, Leonidas J.Guibas, Pat Hanrahan, Qi-Xing Huang, Zimo Li, SilvioSavarese, Manolis Savva, Shuran Song, Hao Su, JianxiongXiao, Li Yi, and Fisher Yu. Shapenet: An information-rich3d model repository.

CoRR , abs/1512.03012, 2015. 3[9] Etienne Corman.

Functional representation of deformablesurfaces for geometry processing . PhD thesis, 2016. PhDthesis. 4[10] Etienne Corman, Justin Solomon, Mirela Ben-Chen,Leonidas Guibas, and Maks Ovsjanikov. Functional char-acterization of intrinsic and extrinsic geometry.

ACM Trans.Graph. , 36(2):14:1–14:17, Mar. 2017. 2, 3, 4, 5, 6, 7, 11[11] Trevor F. Cox and M.A.A. Cox.

Multidimensional Scaling,Second Edition . Chapman and Hall/CRC, 2 edition, 2000. 5 [12] Thibault Groueix, Matthew Fisher, Vladimir G. Kim, BryanRussell, and Mathieu Aubry. 3d-coded : 3d correspondencesby deep deformation. In

ECCV , 2018. 8, 12, 13[13] Nils Hasler, Carsten Stoll, Martin Sunkel, Bodo Rosenhahn,and H-P Seidel. A statistical model of human pose and bodyshape. In

Computer graphics forum , volume 28, pages 337–346. Wiley Online Library, 2009. 2[14] Ruqi Huang, Panos Achlioptas, Leonidas Guibas, and MaksOvsjanikov. Limit Shapes - A Tool for Understanding ShapeDifferences and Variability in 3D Model Collections.

Com-puter Graphics Forum , 2019. 2[15] Artiom Kovnatsky, Michael M Bronstein, Alexander MBronstein, Klaus Glashoff, and Ron Kimmel. Coupled quasi-harmonic bases. In

Computer Graphics Forum , volume 32,pages 439–448. Wiley Online Library, 2013. 4[16] Bruno Levy. Laplace-beltrami eigenfunctions towards analgorithm that ”understands” geometry. In

IEEE Interna-tional Conference on Shape Modeling and Applications 2006(SMI’06) , pages 13–13, June 2006. 3, 4[17] Or Litany, Alex Bronstein, Michael Bronstein, and AmeeshMakadia. Deformable shape completion with graph convo-lutional autoencoders. In

Proc. CVPR , pages 1886–1895,2018. 3[18] Or Litany, Tal Remez, Emanuele Rodol`a, Alex Bronstein,and Michael Bronstein. Deep functional maps: Structuredprediction for dense shape correspondence. In

Proc. ICCV ,pages 5659–5667, 2017. 3[19] Matthew Loper, Naureen Mahmood, Javier Romero, GerardPons-Moll, and Michael J. Black. Smpl: A skinned multi-person linear model.

ACM Trans. Graph. , 34(6):248:1–248:16, Oct. 2015. 2, 6[20] Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope,Nadav Dym, Ersin Yumer, Vladimir Kim, and Yaron Lip-man. Convolutional neural networks on surfaces via seam-less toric covers. 2017. 2[21] Jonathan Masci, Davide Boscaini, Michael Bronstein, andPierre Vandergheynst. Geodesic convolutional neural net-works on riemannian manifolds. In

Proc. ICCV workshops ,pages 37–45, 2015. 2[22] Daniel Maturana and Sebastian Scherer. Voxnet: A 3d con-volutional neural network for real-time object recognition.In

Intelligent Robots and Systems (IROS), 2015 IEEE/RSJInternational Conference on , pages 922–928. IEEE, 2015. 2[23] Mark Meyer, Mathieu Desbrun, Peter Schr¨oder, and Alan HBarr. Discrete Differential-Geometry Operators for Trian-gulated 2-Manifolds. In

Visualization and mathematics III ,pages 35–57. Springer, 2003. 3[24] Maks Ovsjanikov, Mirela Ben-Chen, Justin Solomon, AdrianButscher, and Leonidas Guibas. Functional Maps: A Flexi-ble Representation of Maps Between Shapes.

ACM Transac-tions on Graphics (TOG) , 31(4):30, 2012. 3[25] Maks Ovsjanikov, Etienne Corman, Michael Bronstein,Emanuele Rodol`a, Mirela Ben-Chen, Leonidas Guibas,Frederic Chazal, and Alex Bronstein. Computing and pro-cessing correspondences with functional maps. In

ACM SIG-GRAPH 2017 Courses , 2017. 326] Ulrich Pinkall and Konrad Polthier. Computing DiscreteMinimal Surfaces and their Conjugates.

Experimental math-ematics , 2(1):15–36, 1993. 3[27] Adrien Poulenard and Maks Ovsjanikov. Multi-directionalgeodesic neural networks via equivariant convolution.

ACMTrans. Graph. (Proc. SIGGRAPH Asia) , 37(6):236:1–236:14, 2018. 2[28] Charles Ruizhongtai Qi, Hao Su, Kaichun Mo, andLeonidas J. Guibas. Pointnet: deep learning on point sets for3d classiﬁcation and segmentation.

CoRR , abs/1612.00593,2016. 2, 7[29] Charles R Qi, Hao Su, Matthias Nießner, Angela Dai,Mengyuan Yan, and Leonidas J Guibas. Volumetric andmulti-view cnns for object classiﬁcation on 3d data. In

Proc.CVPR , pages 5648–5656, 2016. 2[30] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J.Guibas. Pointnet++: Deep hierarchical feature learning onpoint sets in a metric space.

CoRR , abs/1706.02413, 2017.8, 12[31] Jean-Michel Roufosse, Abhishek Sharma, and Maks Ovs-janikov. Unsupervised deep learning for structured shapematching.

CoRR , abs/1812.03794, 2018. 6[32] Raif M. Rustamov, Maks Ovsjanikov, Omri Azencot, MirelaBen-Chen, Fr´ed´eric Chazal, and Leonidas Guibas. Map-based exploration of intrinsic shape differences and variabil-ity.

ACM Transactions on Graphics (TOG) , 32(4):1, 2013.1, 2, 3, 4, 5[33] Adriana Schulz, Ariel Shamir, Ilya Baran, David I. W. Levin,Pitchaya Sitthi-Amorn, and Wojciech Matusik. Retrieval onparametric shape collections.

ACM Trans. Graph. , 36(4),Jan. 2017. 2[34] Ayan Sinha, Jing Bai, and Karthik Ramani. Deep learning 3dshape surfaces using geometry images. In

European Confer-ence on Computer Vision , pages 223–240. Springer, 2016.2[35] Ayan Sinha, Asim Unmesh, Qixing Huang, and Karthik Ra-mani. Surfnet: Generating 3d shape surfaces using deepresidual networks. In

The IEEE Conference on ComputerVision and Pattern Recognition (CVPR) , July 2017. 1[36] Arun K. Somani, Thomas S. Huang, and Steven D. Blostein.Least-squares ﬁtting of two 3-d point sets.

IEEE Transac-tions on pattern analysis and machine intelligence , (5):698–700, 1987. 6[37] G¨ul Varol, Javier Romero, Xavier Martin, Naureen Mah-mood, Michael J. Black, Ivan Laptev, and Cordelia Schmid.Learning from synthetic humans. In

CVPR , 2017. 6[38] Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun,and Xin Tong. O-cnn: Octree-based convolutional neu-ral networks for 3d shape analysis.

ACM Transactions onGraphics (TOG) , 36(4):72, 2017. 2[39] Silvia Zufﬁ, Angjoo Kanazawa, David Jacobs, andMichael J. Black. 3D menagerie: Modeling the 3D shapeand pose of animals. In

CVPR , July 2017. 2, 6

A. Proof of Theorem 1.

Proof.

Since X is known to be of rank 3, and G is symmet-ric, we have, by SVD: G = Φ T AXX T A Φ = U Σ U T , where, U, Σ are respectively the top 3 singular vectors andsingular values of G . Therefore, we have Φ T AXR = U √ Σ , where R is a × rigid transformation matrix sat-isfying R T R = I × . In other words, we recover Φ T A ˜ X from E G , where ˜ X = XR is equivalent to X up to rigidtransformations. Then, to recover the projection of ˜ X in thespace spanned by Φ , we simply compute ΦΦ T A ˜ X . B. Ablation Study on Network Design

We investigate multiple architectures for OperatorNet. InTable 2 we compare the reconstruction performance overdifferent combinations of input shape differences, and dif-ferent depths of encoders.We report the performance of 4 different convolutionalencoders from 1 to 4 layers deep by doubling the number ofneurons every layer.Two trends are observed in Table 2: ﬁrst, we alwaysachieve the best performance when all three types of shapedifferences are used, for varying depths of the network; sec-ond, ﬁxing the combination of input shape differences, thenetwork performs better as its depth gets shallower.Putting these two observations together, we justify ourﬁnal model, which has one single layer convolutional en-coder and uses all three types of shape differences as input.

C. Shape Reconstructions

Veriﬁcation of Generalization Power of OperatorNet

To demonstrate the generalization power of OperatorNet,we show in Figure 11 our reconstructions of test shapesfrom the SURREAL dataset. For comparison, we retrievethe shapes in the training set, whose shape differences arethe nearest to the ones of the test shapes. In each of theﬁgures, the top row presents the ground-truth test shapes;the middle row shows reconstructions from OperatorNet;the bottom row demonstrates the shapes retrieved from thetraining set via nearest neighbor search in the space of shapedifferences.It is evident that OperatorNet accurately reconstructs thetest shapes, which deviate from the shapes in the training setsigniﬁcantly, suggesting that our network generalizes wellin unseen data.

Reconstruction of Shapes in Different Discretizations

We show the reconstructions of shapes in a different dis-cretization than the base shape in Figure 12, part of which(the top two rows) is demonstrated in Figure. 7. Here weable 2: Ablation study: auto-encoder performance on DFAUST testset (measured by the loss function as deﬁned in Eq (9),the errors in the table are at the scale of − ).Encoder architecture Area Ext Conf A+E A+C E+C A+E+CConv. 8 Conv. 8 ×

16 9.08 4.54 4.28 4.65 3.93 3.10

Conv. 8 × ×

32 9.90 5.54 4.91 5.59 4.88 3.71

Conv. 8 × × ×

64 11.16 6.39 5.93 6.89 5.42 4.35

Figure 11: Top row: ground-truth embeddings; middle row: reconstructions via OperatorNet; bottom row: shapes from thetraining set, whose shape differences that are closest to the ones of the test shapes in the top row.further train an OperatorNet with ﬁner labels (of 2k verticescompared to that of 1k vertices used in the original version)and show the reconstructions on the third row of Figure 12.We emphasize that the use of coarse labels is for a fair com-parison to the geometrical baselines for reconstructing em-beddings from shape differences. As shown in the third row,OperatorNet reconstructs the shapes in a higher resolutionwell, which is not possible for the geometric approaches.Reconstructing shapes of different triangulations is ex-tremely difﬁcult for geometric approach: we demonstratethe reconstructions via the geometric approach [10] in thebottom row: the outputs are all close to the source shape(i.e., the base shape), which suggests that the algorithmstruggles to ﬁnd the right direction to deform the source tothe target.

D. Shape Analogies

In addition to Figure 10, we present more gender analo-gies in Figure 13. Note that though in some cases PointNetalso delivers reasonable results (e.g. the ones on the toprow), the results of OperatorNet are in general more natu-ral and semantically meaningful (see, e.g., the discrepancieshighlighted in the red dotted boxes).We also present a set of shape analogies that transfer Figure 12: Top row: input shapes with different number ofvertices (5k) than that of the base shape (1k); second row:reconstructions of the original OperatorNet; third row: re-constructions of OperatorNet trained with higher resolutionlabels (2k vertices); bottom row: reconstructions via the ge-ometric approach [10].pose (top row) and style (bottom row) across human shapesin Figure 14. We observe that our results (the fourth col-igure 13: Gender analogies via OperatorNet and PointNet. Note that though in some cases PointNet also delivers reasonableresults (e.g. the ones on the top row), the results of OperatorNet are more natural and semantically meaningful (see, e.g., thediscrepancies highlighted in the red dotted boxes).umn from the left) are both more natural and intuitive whilePointNet (the right-most column) produces less satisfactoryresults with, e.g., local area distortions (see the red dottedboxes).Lastly, we show analogies among animals in Figure 15,where we present both pose transfer (top row) and styletransfer (bottom row) and comparison to the results ofPointNet.Figure 14: Human shape analogies via OperatorNet andPointNet auto-encoder (see the red dotted boxes for the dis-crepancies).

E. Shape Interpolation

Linear Interpolation vs. Multiplicative Interpolation

We note that, since the shape differences are represented bymatrices, it is also possible to interpolate shape differences Figure 15: Top row: transferring the pose of S B , from S C to S X . Bottom row: transferring the animal type of S B , from S C to S X . PointNet does not maintain the correctpose (bottom row) and does not transfer details such as openmouths correctly.linearly, i.e., D ( t ) = (1 − t ) D + t D . However, as weargue in Section 3, the multiplicative property of shape dif-ferences suggests that it is more natural to interpolate thedifference operators following Eq. (6). To illustrate thispoint, we show in Figure 16 interpolated sequences withrespect to the two schemes above – the multiplicative onein the ﬁrst row and the linear one in the second row. It isvisually evident that the former leads to more continuousand evenly deformed sequence. Moreover, we compute thedistance between consecutive shapes in both sequences andplot the distributions in the bottom panel of Figure 16 as aquantitative veriﬁcation. Baseline Comparison

To make our comparison morecomplete, we further compare our method to the auto-encoder proposed in 3D-Coded [12], Multi-Chart GAN pro-posed in [3], and a PointNet++ [30] based auto-encoder.Regarding 3D-Coded [12] method, we ﬁrst reconstructigure 16: Reconstructions regarding shape differences interpolated using multiplicative scheme (ﬁrst row) and using linearscheme (second row). In the bottom panel we plot the distances between consecutive reconstructed embeddings for bothsequences. The multiplicative scheme clearly delivers more smooth deformation sequence.the source and target shapes using their pre-trained modeland linearly interpolate the produced latent representations.On the other hand, in [3], a GAN is trained to generaterealistic human shapes. In particular, we follow the inter-polation scheme described in [3]: ﬁrst we pick two ran-domly generated latent vectors z , z , which, via the GANgive arise to two shapes G ( z ) , G ( z ) . Then, the interpola-tion between the two shapes is achieved as G ( z ( t )) , where z ( t ) = (1 − t ) z + tz . We randomly generate 1000 shapesusing their trained model and pick G ( z i ) , i = 1 ,2