Learning Manifold Patch-Based Representations of Man-Made Shapes
LLearning Manifold Patch-Based Representations of Man-Made Shapes
DMITRIY SMIRNOV,
Massachusetts Institute of Technology
MIKHAIL BESSMELTSEV,
Université de Montréal
JUSTIN SOLOMON,
Massachusetts Institute of Technology
Choosing the right shape representation for geometry is crucial for making3D models compatible with existing applications. Focusing on piecewise-smooth man-made shapes, we propose a new representation that is usablein conventional CAD modeling pipelines and can also be learned by deepneural networks. We demonstrate the benefits of our representation byapplying it to the task of sketch-based modeling. Given a raster image, oursystem infers a set of parametric surfaces that realize the input in 3D. Tocapture the piecewise smooth geometry of man-made shapes, we learn aspecial shape representation: a deformable parametric template composedof Coons patches. Naïvely training such a system, however, would sufferfrom non-manifold artifacts of the parametric shapes as well as from a lackof data. To address this, we introduce loss functions that bias the network tooutput non-self-intersecting shapes and implement them as part of a fullyself-supervised system, automatically generating both shape templates andsynthetic training data. To test the efficacy of our system, we develop atestbed for sketch-based modeling and show results on a gallery of syntheticand real artist sketches. As additional applications, we also demonstrateshape interpolation and provide comparison to related work.CCS Concepts: •
Computing methodologies → Parametric curve andsurface models ; Neural networks .Additional Key Words and Phrases: Sketch-based Modeling, Deep Learning
Recent advances in deep learning have resulted in systems capableof producing 3D geometry in a variety of formats. While state-of-the-art methods that output point clouds, triangle meshes, voxelgrids, and implicitly defined surfaces can yield detailed results, theserepresentations are dense, high-dimensional, and not easily compat-ible with existing CAD modeling pipelines. In this work, we focuson developing a 3D representation that is parsimonious, geometri-cally interpretable, and easily editable with standard tools while atthe same time being compatible with deep learning. Our choice ofrepresentations enables a shape modeling system that leverages theability of deep neural networks to process incomplete, ambiguousinput data and produces useful, consistent 3D output.To demonstrate our representation on a model problem, we presenta deep learning-based system to infer a complete man-made 3Dshape from one or more bitmap inputs. Our system infers a networkof parametric surfaces that realize the drawing in 3D. The com-ponent surfaces, parameterized by their control points, are linkedin a manifold fashion and allow for easy modification in conven-tional shape editing software as well as conversion to a manifoldmesh. Our primary technical contributions involve the develop-ment of machinery for learning parametric 3D surfaces in a fashionthat is efficiently compatible with modern deep learning pipelinesand effective for a challenging 3D modeling task. Our algorithmautomatically infers shape templates for different categories andincorporates a number of loss functions that operate directly onthe geometry rather than in the parametric domain or on a gridsampling of surrounding space. Extending learning methodologies
Fig. 1. Given a bitmap sketch of a man-made shape, our method automati-cally infers a complete manifold parametric
3D model, ready to be edited,rendered, or converted to a mesh. Compared to conventional methods, ourresolution-independent parsimonious shape representation allows us tofaithfully reconstruct sharp features (wing and tail edges) as well as smoothregions. from images and data points to more exotic modalities like networksof surface patches is a central theme of modern graphics, vision,and learning research, and we anticipate broad application of thesetechnical developments as fundamental tools in CAD workflows.In order to test our novel system, we choose sketch-based model-ing as a model problem and target application. Converting rough,incomplete 2D input into a clean, complete 3D shape is extremelyill-posed, requiring hallucination of missing parts and interpreta-tion of noisy signal. To cope with these ambiguities, most systemsrely on hand-designed shape priors. This approach severely limitsthe applications of those methods. Each shape category requires itsown expert-designed prior, and many shape categories do not admitobvious means of regularizing the reconstruction process. As analternative, a few recent papers explore the possibility of learningthe shapes from data, implicitly inferring the relevant shape priors[Delanoy et al. 2018; Lun et al. 2017; Wang, Wang, Qian, and FangWang et al.], but their output models often lack resolution and sharpfeatures necessary for high-quality 3D modeling.In more detail, most sketch-based modeling algorithms targetnatural shapes like humans and animals [Bessmeltsev et al. 2015;Entem et al. 2015; Igarashi et al. 1999], which are typically smooth.To aid shape reconstruction, these systems regularize their objectivefunctions to promote smoothness of the reconstructed shape; repre-sentations like generalized cylinders are chosen to optimize in thespace of smooth surfaces [Bessmeltsev et al. 2015; Entem et al. 2015].This, however, does not apply to the focus of our work: man-madeshapes. These objects, like planes or espresso machines, are only piecewise smooth and hence do not satisfy the assumptions of manysketch-based modeling systems.In industrial design, man-made shapes are typically modeledusing collections of smooth parametric patches, such as NURBSsurfaces, with patch boundaries forming the sharp features. Tolearn such shapes effectively, we leverage this structure by usinga special shape representation, a deformable parametric template[Jain et al. 1998]. This template is a manifold surface composed of a r X i v : . [ c s . G R ] M a y • Smirnov, Bessmeltsev, and Solomon patches, where each patch is parameterized by its control points;example patches include Bézier patches [Farin 2002] and Coonspatches [Coons 1967] (Fig. 6(a)). This representation enables us tocontrol the smoothness of each patch while allowing the model tointroduce sharp edges between patches where necessary.Compared to traditional representations, deformable parametrictemplates have numerous benefits for our task. They are intuitiveto edit with conventional software, are resolution-independent, andcan be meshed to arbitrary accuracy. Furthermore, since typicallyonly boundary control points are needed, our surface representa-tion has relatively few parameters to learn and store. Finally, thisstructure admits closed-form expressions for normals and othergeometric features, which can be used to construct loss functionsthat improve reconstruction quality (§3.2).More importantly, beyond defining the connectivity of the finalshape, our deformable template acts as a strong initial guess thatdrives the learning towards a better local minimum. Comparedto a generic template, this prealigned category-specific templateimproves reconstruction of small details and sharp features of themodel.The core of our system is a CNN-based architecture to infer thecoordinates of control points of a deformable template, algorithmi-cally generated for a given shape category by a novel method. Anaïve attempt to develop and train such networks faces three majorchallenges: the difficulty of detecting non-manifold surfaces, andstructural variations within a shape category, and the lack of data.We address these challenges as follows: • We introduce several loss functions that encourage our patch-based output to form a manifold mesh without topological arti-facts or self-intersections. • Deformable templates are a natural choice for objects with fixedstructure, such as cups or guitars. However, some categories ofman-made shapes exhibit structural variation. To address this, foreach category we algorithmically generate a varying deformabletemplate, which allows us to separate structural variation using avariable number of parts (Sec. 3.1.2), which we demonstrate onmodular turbines on airplanes. • Supervised methods mapping from sketches to 3D models re-quire a database of sketch-model pairs, and, to-date, there areno such large-scale repositories. We introduce a synthetic sketchaugmentation pipeline that uses insights from the artistic litera-ture to simulate possible variations observed in natural drawings(§4.1). Although our model is trained on synthetic sketches, itgeneralizes to natural sketches (Fig. 18).
Contributions.
Our key technical contributions include learninga new geometric representation, a novel method to automaticallygenerate a template for a given collection of shapes, and new lossterms preventing non-manifold surfaces. We present a system forpredicting parametric manifold surfaces of models of man-made3D shapes using deep learning. Our method is fully self-supervised;while we predict patch parameters, none of our data is labeledwith ground truth patch decompositions, and our templates canbe generated in a completely automatic manner. We validate byshowing applications to sketch-based modeling, with a gallery of results on both synthetic and natural sketches from various artists,as well as interpolation to generate novel 3D models.
Our work introduces a new 3D representation as a significant steptowards bridging the gap between modern progress in deep learningand long-standing problems in CAD modeling. To give a rough ideaof the landscape of available methods, we briefly summarize relatedwork in deep learning and sketch-based modeling.
Learning to reconstruct 3D geometry from various input modalitieshas recently enjoyed significant research interest. Typical forms ofinput are images [Choy et al. 2016; Delanoy et al. 2018; Gao et al.2019; Häne et al. 2019; Wang, Wang, Qian, and Fang Wang et al.;Wu et al. 2017; Yan et al. 2016] and point clouds [Groueix et al. 2018;Park et al. 2019; Williams et al. 2019]. When designing a networkfor this task, two considerations affect the architecture: the lossfunction and the geometric representation.
Loss Functions.
One promising and popular direction employsa differentiable renderer and measures 2D image loss between arendering of the inferred 3D model and the input image, often called2D-3D consistency or silhouette loss [Kato et al. 2018; Rezende et al.2016; Tulsiani et al. 2018, 2017c; Wu et al. 2017, 2016a; Yan et al.2016]. A notable example is the work by Wu et al. [2017], whichlearns a mapping from a photograph to a normal map, a depth map,a silhouette, and the mapping from these outputs to a voxelization.They use a differentiable renderer and measure inconsistencies in 2D.2D losses are powerful in computer vision. Hand-drawn sketches,however, cannot be interpreted as perfect projections of 3D objects:They are imprecise and often inconsistent [Bessmeltsev et al. 2016].Another approach uses 3D loss functions, measuring discrepanciesbetween the predicted and target 3D shapes directly, often via Cham-fer or a regularized Wasserstein distance [Gao et al. 2019; Groueixet al. 2018; Liu et al. 2010; Mandikal et al. 2018; Park et al. 2019;Williams et al. 2019], or—in the case of highly-structured represen-tations such as voxel grids—cross-entropy [Häne et al. 2019]. Webuild on this work, adapting the Chamfer distance to patch-basedgeometric representations and extending the loss function with newregularizers (§3.2).
Shape representation.
As noted by Park et al. [2019], geometricrepresentations in deep learning broadly can be divided into threeclasses: voxel-based representations, point-based representations,and mesh-based representations.The most popular approach is to use voxels, directly reusing suc-cessful methods for 2D images [Choy et al. 2016; Delanoy et al.2018; Tulsiani et al. 2018; Wang, Wang, Qian, and Fang Wang et al.;Wang et al. 2018a; Wu et al. 2017, 2018; Yan et al. 2016; Zhang et al.2018; Zhirong Wu et al. 2015]. The main limitation of voxel-basedmethods is low resolution due to memory limitations. Octree-basedapproaches mitigate this problem [Häne et al. 2019; Wang et al. earning Manifold Patch-Based Representations of Man-Made Shapes • 3
Fig. 2. Editing a 3D model produced by our method. Because we output 3D geometry as a collection of consistent, well-placed NURBS patches, user edits canbe made in conventional CAD software by simply moving control points. Here, we are able to refine the trunk of a car model with just a few clicks. resolution, but even this den-sity is insufficient to produce visually convincing surfaces. Further-more, voxelized approaches cannot directly represent sharp features,which are key for man-made shapes.Point-based approaches represent 3D geometry as a point cloud[Fan et al. 2017; Lun et al. 2017; Mandikal et al. 2018; Tatarchenkoet al. 2016; Yang et al. 2018], sidestepping the memory issues. Thoserepresentations, however, do not capture connectivity. Hence, theycannot guarantee production of manifold surfaces.Some recent methods use mesh-based representations [Bagaut-dinov et al. 2018; Baque et al. 2018; Kanazawa et al. 2018; Litanyet al. 2018; Wang et al. 2019], representing shapes using deformablemeshes. We take inspiration from this approach to reconstruct asurface by deforming a template, but our parametric template repre-sentation allows us to more easily enforce piecewise smoothness andtest for self-intersections (§3.2). These properties are difficult to mea-sure on meshes in a differentiable manner. Compared to a generictemplate shape, such as sphere, our category-specific templates im-prove the reconstruction quality, and enable complex reconstructionconstraints, e.g., symmetry. We further compare to the deformablemesh representations in Sec. 4.5. Other mesh-based methods eitheruse a precomputed parameterization to a domain on which it isstraightforward to apply CNN-based architectures [Haim et al. 2019;Maron et al. 2017; Sinha et al. 2016] or learn a parameterizationdirectly [Ben-Hamu et al. 2018; Groueix et al. 2018]. Even thoughthese methods are not specifically designed for sketch-based model-ing, for completeness, we compare our results to one of the morepopular methods, AtlasNet [Groueix et al. 2018] (Fig. 23).Most importantly, our man-made shape representation is native to modern CAD software, such as Autodesk Fusion 360, Rhino, andSolidworks, and it can be directly exported and edited in this soft-ware, as demonstrated in Figure 2. The key to this flexibility is thetype of the parametric patches we use, bilinearly blended Coonspatches, which belong to the family of NURBS surfaces and canbe trivially converted to a NURBS representation [Piegl and Tiller1996], the standard surface type in CAD. The other common shaperepresentations, such as meshes or point clouds, cannot be easilyconverted into NURBS format: algorithmically fitting NURBS sur-faces is nontrivial and is an active area of research [Krishnamurthyand Levoy 1996; Yumer and Kara 2012].Finally, a few works explore less common representations, suchas signed distance functions [Mescheder et al. 2019], implicit fields[Chen and Zhang 2019], implicit surfaces [Genova et al. 2019], shapeprograms [Tian et al. 2019], splines [Gao et al. 2019], volumetricprimitives [Tulsiani et al. 2017a; Zou et al. 2017], and elements of alearned latent space [Achlioptas et al. 2017; Wu et al. 2016b]. These papers demonstrate impressive reconstruction results, but eitherdo not aim to produce an expressive complete 3D model [Gao et al.2019; Tian et al. 2019; Tulsiani et al. 2017a; Zou et al. 2017] or arenot tuned to CAD applications [Achlioptas et al. 2017; Chen andZhang 2019; Genova et al. 2019; Mescheder et al. 2019; Wu et al.2016b]. It is unclear how these representations can be successfullyused for generating editable CAD shape representations.A few deep learning algorithms address sketch-based modeling[Delanoy et al. 2018; Huang et al. 2017; Li et al. 2018; Lun et al.2017; Nishida et al. 2016; Wang, Wang, Qian, and Fang Wang et al.].Nishida et al. [2016] and Huang et al. [2017] train networks to pre-dict procedural model parameters that yield detailed shapes from asketch. These methods produce complex high-resolution models,but only for the classes of shapes that can be procedurally gener-ated, such as trees or buildings. Lun et al. [2017] use a CNN-basedencoder-decoder architecture to predict multi-view depth and nor-mal maps, later converted to point clouds. Li et al. [2018] improveon their results by first predicting a flow field from an annotatedsketch of an organic smooth shape, later converted to a depth map.In contrast, we output a deformable parametric template, whichcan be directly, without post-processing, converted to a manifoldmesh. Wang, Wang, Qian, and Fang [Wang et al.] learn from unla-beled databases of sketches and 3D models with no correspondencebetween them. They train two networks: The first network is aGAN with an autoencoder-based discriminator aimed to embedboth natural sketches and renders into a latent space with matchingdistributions. The second network is a CNN mapping the latentvector into a voxelization, trained on renders only. Another inspi-ration for our research is the work of Delanoy et al. [2018], whichreconstructs a 3D object, represented as voxelization, given sketchesdrawn from multiple views. We compare our results to [Delanoyet al. 2018; Lun et al. 2017] in Fig. 21. Reconstructing 3D geometry from sketches has a long history incomputer graphics. A complete survey of sketch-based modeling isbeyond the scope of this paper; an interested reader may refer tothe recent paper by Delanoy et al. [2018] or surveys by Ding andLiu [2016] and Olsen et al. [2009]. Here, we mention the work mostrelevant to our approach.Many sketch-based 3D shape modeling systems are incremental ,i.e., they allow users to model shapes by progressively adding newstrokes, updating the 3D shape after each action. Such systems maybe designed as single-view interfaces, where the user is often re-quired to manually annotate each stroke [Chen et al. 2013; Cherlinet al. 2005; Gingold et al. 2009; Shtof et al. 2013], or they may allow • Smirnov, Bessmeltsev, and Solomon strokes to be added to multiple views [Igarashi et al. 1999; Nealenet al. 2007; Tai et al. 2004]. These systems can cope with considerablegeometric complexity, but their dependence on the ordering of thestrokes forces artists to deviate from standard approaches to sketch-ing. In contrast, our machine learning method allows the systemto interpret complete sketches, eliminating training for artists touse our system and enabling 3D reconstruction of legacy sketches.Similarly, while Xu et al. [2014] present a single-view 3D curve net-work reconstruction system for man-made shapes that can produceimpressive sharp results, they process specialized design sketches,consisting of cross-sections, output only a curve network, and relyon user annotations. Our system produces complete 3D shapes fromnatural sketches with no extra annotation.A variety of systems interpret complete 2D sketches with noextra information. This species of input is extremely ambiguousthanks to hidden surfaces and noisy sketch curves, and hence recon-struction algorithms rely on strong 3D shape priors. These priorsare typically manually created. For example, priors for humanoidcharacters, animals, and natural shapes promote smooth, round,and symmetrical shapes [Bessmeltsev et al. 2015; Entem et al. 2015;Igarashi et al. 1999], while garments are typically regularized to be(piecewise-)developable [Jung et al. 2015; Li et al. 2017, 2018; Robsonet al. 2011; Turquin et al. 2004; Zhu et al. 2013]; man-made shapesare often approximated as combinations of geometric primitives[Shao et al. 2016] or as unions of nearly-flat faces [Yang et al. 2013].Our work focuses on man-made shapes, which have characteristicsharp edges and are only piecewise smooth rather than developable.We use a learned deformable patch template to promote shapeswith this structure (§3.1). Moreover, introducing specific expert-designed priors can be challenging: Man-made shapes are varied,diverse, and complex (Fig. 1, 9-18). Instead, we automatically learna category-specific shape prior from data.Most sketch-based modeling interfaces process vector input, whichconsists of a set of clean curves [Bessmeltsev et al. 2015, 2016; Entemet al. 2015; Jung et al. 2015; Li et al. 2017, 2018; Xu et al. 2014]. Thisapproach is acceptable for tablet-based interfaces, but it forces usersto deviate from their preferred drawing media. Paper-and-pencilsketches still remain a preferred means of capturing shape. Whilethey can be vectorized and cleaned using modern methods [Bess-meltsev and Solomon 2019; Simo-Serra et al. 2018], preprocessingcan introduce unnecessary distortions and errors, leading to sub-optimal reconstruction. In contrast, our system directly processes bitmap sketches.
We engineer a deep learning pipeline that outputs a parametrically-defined 3D surface. We describe the geometric representation ofthe output surfaces (§3.1), define the loss terms that we optimize(§3.2), and specify the deep CNN architecture and training procedure(§3.3).
We would like to encode 3D surfaces witha compact and expressive representation. To capture the details ofman-made shapes, our representation must be capable of containing smooth regions as well as sharp creases and corners. Given theserequirements, we represent our surfaces as collections of parametricprimitives, where each primitive is a Coons patch [Coons 1967].A
Coons patch is a parametric surface patch in three dimensionsspecified by four boundary curves sharing endpoints. We choseeach boundary curve to be a cubic Bézier curve , c ( γ ) , specified byfour control points p , p , p , p ∈ R , two of which, p and p , areconnected to adjacent curves. Thus, our patches are parameterizedby 12 control points in total.A single Bézier curve c : [ , ] → R is defined as c ( γ ) = p ( − γ ) + p γ ( − γ ) + p γ ( − γ ) + p γ , (1)and a Coons patch P : [ , ] × [ , ] → R is defined as P ( s , t ) = ( − t ) c ( s ) + tc ( − s ) + sc ( t ) + ( − s ) c ( − t )− ( c ( )( − s )( − t ) + c ( ) s ( − t ) + c ( )( − s ) t + c ( )) st . (2) We use templates to specify the connectivity ofa collection of Coons patches. A template consists of the minimalnumber of control points necessary to define the Coons patchesfor the entire surface; control points for adjacent patches sharingboundary curves or corners are reused rather than duplicated. Forinstance, we can define a template with cube topology based on aquad mesh with six faces; the resulting template contains 12 sharedcurves and 32 control points. d c d We allow for the edge of one patch tobe contained within the edge of anotherwithout subdividing either patch by using junction curves . A junction curve c d is con-strained to a lie along a parent curve d andis thus parameterized by s , t ∈ [ , ] , suchthat c ( ) = d ( s ) and c ( ) = d ( t ) . We mustbe careful when defining junction curves so that each endpoint ofthe junction curve is well-defined in terms a single parent curve.We address this in detail below.A template provides hard topological constraints for our surfacesas well as an initialization of their geometry and, optionally, a meansfor geometric regularization. Templates are crucial in ensuring thatour predicted patches have consistent topology—an approach with-out templates would result in unstructured patch collections, withpatches that do not align at boundaries or form a watertight, mani-fold surface.While we demonstrate that our method works using a genericsphere template, we optionally can define distinct templates for dif-ferent shape categories to incorporate category-specific geometricpriors. These templates capture only coarse geometric features andapproximate scale. We outline a strategy for obtaining templatesbelow. Algorithmic construction of templates.
We design a simple systemto construct a template automatically given as input a collectionof cuboids. Such a collection of cuboids can be computed auto-matically for a shape category, e.g., given a segmentation or usingself-supervised methods such as [Smirnov et al. 2020; Sun et al. 2019;Tulsiani et al. 2017b], or easily produced manually using standardCAD software. Our algorithms converts any collection of cuboids earning Manifold Patch-Based Representations of Man-Made Shapes • 5 (a) (b) (c)
Fig. 3. A summary of our agglomerative algorithm for automatic templategeneration. Given any collection of cuboids (a), we first split the quad facesand remove interior and overlapping faces to obtain a valid quad mesh (b),and iteratively merge adjacent faces to obtain the final template (c). into a template compatible with our method. In our experiments,we show templates algorithmically computed from pre-segmentedshapes—for a given shape, we obtain a collection of cuboids by tak-ing the bounding box around each connected component of eachsegmentation class.While a cuboid decomposition may be a good approximationof a 3D model, the cuboids may overlap and thus cannot be usedas a template. We first snap our cuboids to an integer lattice andthen refine the decomposition by splitting each cuboid face at everycoordinate of the grid. We remove overlapping and interior facesto obtain a quad mesh. While the resulting quad mesh can be useddirectly as a template, it typically consists of a large number of faces,and thus we further process it.We simplify our quad mesh by merging adjacent quads, ensuringthat junction curves are well-defined and that there are no circu-lar definitions. We do this with a greedy agglomerative algorithm,iterating over each quad in order of descending area and mergingit with an adjacent quad as long as the merge does not result inany ill-defined curves. To keep track of how junction curves aredefined, we use a quad dependency graph . The graph contains a anode for each quad and a directed edge from node A to node B if acurve of quad B is a junction curve whose parent is a side of A . Thisstructure allows us to determine whether a merge is impermissible—if the resulting dependency graph contains a cycle, or some nodeis the child of two parents that do not share a graph edge, we donot merge. We continue iterating over quads until no permissiblemerges remain. Then, the order in which we must define junctionsis simply a topological ordering of the dependency graph. We showa example cuboid input, intermediate construction, and final outputof this algorithm in Figure 3.Given cuboid decompositions of multiple shapes in a category,we find the median model in the category with respect to Chamferdistance. Since, in the datasets used for our experiments, modelswithin a shape category are generally aligned and normalized, themedian provides a rough approximation of the typical geometry. Structural variation using templates.
For category-specific tem-plates, we use the fact that template patches are consistently placedon semantically meaningful components of the shape to accountfor structural variation doing training. For instance, in the airplanesshape category, certain models contain turbines while others do not.
Fig. 4. Structural variation. When using a template, since patches aremapped consistently across inputs, we can choose to toggle modular com-ponents by simply showing or hiding certain patches. Here, we demonstratethe same airplane model with and without turbines. Both configurationsproduce manifold meshes. (a) (b)
Fig. 5. Our geometry representation is composed of Coons patches (a) thatare organized into a deformable template (b).Fig. 6. The templates used in our experiments. From top to bottom, left toright: bottle, knife, guitar, car, airplane, coffee mug, gun, bathtub, 24-patchsphere, 54-patch sphere.
When constructing the airplane template, we note which patchescome from cuboids corresponding to turbines, and, during train-ing, only sample from the turbine patches for models that containturbines. This allows to train the entire airplane shape category,effectively using two distinct templates. Additionally, at test time,we can toggle turbines on or off for any given input, as shown inFigure 4.
In our training procedure, we fit a collection of Coons patches { P i } to a target mesh M by optimizing a differentiable loss function.Below, we describe each term of our loss—a main reconstructionloss analogous to Chamfer distance (§3.2.1), a normal alignment loss(§3.2.2), a regularizer to inhibit self-intersections (§3.2.3), a patchflatness regularizer (§3.2.4), and two template-based priors (§3.2.5and §3.2.6). Given two measurable shapes A , B ⊂ R and point sets X and Y sampled from A and B , respectively,the directed Chamfer distance between X and Y isCh dir ( X , Y ) = | X | (cid:213) x ∈ X min y ∈ Y d ( x , y ) , (3) • Smirnov, Bessmeltsev, and Solomon where d ( x , y ) is Euclidean distance between x and y . The symmetricChamfer distance isCh ( X , Y ) = Ch dir ( X , Y ) + Ch dir ( Y , X ) . (4)Chamfer distance is differentiable and therefore a popular lossfunction in deep learning pipelines that optimize shapes (§2.1). Itsuffers from several disadvantages, however. In particular, the dis-tribution under which X and Y are sampled from A and B has asignificant impact on the Chamfer distance; sampling in the para-metric domain does not capture the area measure of the surface.In our setting, sampling uniformly from Coons patches is difficult,while sampling uniformly from the parametric domain results inoversampling around regions with high curvature.To address this sampling issue, following Smirnov et al. [2020],we first define the variational directed Chamfer distance , startingfrom (3):Ch dir ( X , Y ) = | X | (cid:213) x ∈ X min y ∈ Y d ( x , y ) (5) ≈ E x ∼U A (cid:20) inf y ∈ Y d ( x , y ) (cid:21) (6) ≈ ( X ) ∫ X inf y ∈ Y d ( x , y ) d x def = Ch vardir ( A , B ) , (7)where U A is the uniform distribution on A . Variational symmetricChamfer distance Ch var ( A , B ) is defined analogously.We leverage the fact that, while it is difficult to sample uniformlyfrom our parametric patches, we are able to sample uniformly fromtheir parametric domain (i.e., the unit square) in a straightforwardfashion. Thus, we perform a change of variables:Ch vardir ( P , M ) = (8) = ( P ) ∫ P inf y ∈ M d ( x , y ) d x (9) = ( P ) ∬ inf y ∈ M d ( P ( s , t ) , y ) | J ( s , t )| d s d t (10) = ( P ) · ( □ ) ∬ inf y ∈ M d ( P ( s , t ) , y ) | J ( s , t )| d s d t (11) = ( P ) E ( s , t )∼U □ (cid:20) inf y ∈ M d ( P ( s , t ) , y )| J ( s , t )| (cid:21) (12) = E ( s , t )∼U □ (cid:2) inf y ∈ M d ( P ( s , t ) , y )| J ( s , t )| (cid:3) E ( s , t )∼U □ [| J ( s , t )|] , (13)where □ = [ , ] × [ , ] and J ( s , t ) is the Jacobian of P ( s , t ) . Inpractice, we approximate this value via Monte Carlo integration:Ch vardir ( P , M ) ≈ | U □ | (cid:205) ( s , t )∈ U □ min y ∈ M d ( P ( s , t ) , y )| J ( s , t )| | U □ | (cid:205) ( s , t )∈ U □ | J ( s , t )| (14) = (cid:205) ( s , t )∈ U □ min y ∈ M d ( P ( s , t ) , y )| J ( s , t )| (cid:205) ( s , t )∈ U □ | J ( s , t )| , (15)where U □ is a set of points uniformly sampled from the unit square. Since we can precompute uniformly sampled random points fromthe target mesh, we do not need to use area weights to computeCh vardir ( M , P ) . Thus, our area-weighted Chamfer distance is L Ch (∪ P i , M ) = (cid:205) i (cid:205) ( s , t )∈ U □ min y ∈ M d ( P ( s , t ) , y )| J i ( s , t )| (cid:205) i (cid:205) ( s , t )∈ U □ | J i ( s , t )| + | M | (cid:213) x ∈ M min y ∈∪ P i d ( x , y ) . (16)We use symbolic evaluation software to compute the expression for J i ( u , v ) for Coons patch i given its control points in closed-form;this formula is computed once and compiled into our code. While the Chamfer distance loss termencourages our predicted patches to be close to the ground-truthmesh with respect to Euclidean distance, it contains no explicitnotion of curvature or normal alignment. This results in surfaceswhose curvature differs significantly from that of the ground truthmodels (see §4.4, Figure 20 (a)). To address this, we add an additionalnormal alignment loss term.This loss term is computed analogously to Ch dir (∪ P i , M ) , exceptthat instead of Euclidean distance, we compute normal distance,defined as d N ( x , y ) = ∥ n x − n y ∥ , (17)where n x is the normal vector at point x . For each point y sampledfrom our predicted surface, we compare n y to n x , where x ∈ M isclosest to y under Euclidean distance, and, symmetrically, for each x ′ ∈ M , we compare n x ′ to to n y ′ , where y ′ ∈ ∪ P i is closest to x ′ .We precompute the normal vectors for all points sampled from ourtarget meshes, and we again use symbolic differentiation to computethe expression for the normal vector of a Coons patch at P ( u , v ) .In analogy to the variational Chamfer loss above, we have L normal (∪ P i , M ) = (cid:205) i (cid:205) ( u , v )∈ U □ d N ( NN ( P i ( u , v ) , M ) , P i ( u , v )) | J i ( u , v )| (cid:205) i (cid:205) ( u , v )∈ U □ | J i ( u , v )| + | M | (cid:213) x ∈ M d N ( x , NN ( x , ∪ P i )) , (18)where NN ( x , Y ) is the nearest neighbor to x in Y under Euclideandistance. We introduce a collision detectionloss to detect pairwise patch intersections. We define this loss as L coll ({ P i }) = (cid:213) i (cid:44) j exp (cid:18) − (cid:16) min ( d (T i , P j ) , d (T j , P i ))/ ε (cid:17) (cid:19) , (19)where T i is a triangulation of patch P i , and P i is a set of pointssampled from patch P i . We triangulate a patch during training bytaking the image of a fixed triangulation of a regular grid in patchparameter space. With a small ε ( ε = − in our experiments), thisexpression smoothly interpolates between near-zero when the twopatches do not intersect and one if the patches are intersecting, upto resolution of the grid used to compute the triangulation. For apair of adjacent patches or those that share a junction, we truncate earning Manifold Patch-Based Representations of Man-Made Shapes • 7 one patch by one grid row at the adjacency before evaluating thecollision loss. The loss functions defined aboveensure that our output is a manifold surface that matches the targetgeometry. However, we also prefer that our Coons patches align tosmooth regions of the geometry and that sharp creases fall on patchboundaries. To this end, we define a patch flatness regularizer thatfavors flat Coons patches, discouraging excessively high curvature.The patch flatnesss regularizer encourages each Coons patch map P : [ , ] × [ , ] → R to be close to a linear map. For each Coonspatch, we sample random points U □ in parameter space, computetheir image P ( U □ ) and fit a linear function using linear least-squares.Thus, we have ˆ P ( U □ ) = AU □ + b ≈ P ( U □ ) for some A , b . We definepatch flatness loss as L flat ({ P i }) = (cid:205) i (cid:205) ( u , v )∈ U □ ∥ ˆ P i ( u , v ) − P i ( u , v )∥ | J i ( u , v )| (cid:205) i (cid:205) ( u , v )∈ U □ | J i ( u , v )| . (20) For shape categories wherea category-specific template is available, we not only utilize thetemplate geometry to initialize the network output but also regu-larize the output geometry using normals defined by the template.Along with the patch flatness regularizer, this encourages favorablepositioning of patch seams and prevents patches from unnecessarilysliding over high-curvature regions.We define template normals loss as L template ({ P i } , { T i }) = (cid:205) i (cid:205) ( u , v )∈ U □ ∥ n P i ( u , v ) − n T i ∥ | J i ( u , v )| (cid:205) i (cid:205) ( u , v )∈ U □ | J i ( u , v )| , (21)where n T i is the normal vector of the i th template patch; sincetemplate patches are flat, the normal vector is constant for a patch. Man-made shapes frequently exhibit globalbilateral symmetries. Enforcing symmetry during reconstructionmay be problematic for previous shape representations, such as de-formable meshes or implicit surfaces. In contrast, our representationallows for a straightforward implementation of symmetry. Havingcomputed the symmetry planes of the initial template, we may en-force symmetric positions of the corresponding control points asan additional loss term: L sym (∪ P i ) = | S | (cid:213) ( i , j )∈ S ∥( P ix − a , P iy , P iz ) − ( a − P jx , P jy , P jz )∥ (22)where S contains pairs of indexes of symmetric control points and P i = ( P ix , P iy , P iz ) is the i th control point. Here, we define symmetryloss for symmetry plane x = a , but the definition for other axes ofsymmetry is analogous.We employ this principle to enforce symmetrical reconstructionof airplanes and cars. collisionlosscollisionloss templatenormalslosstemplatenormalslossnormalalignmentlossnormalalignmentlossChamferlossChamferlossCoonspatches templateground truth symmetrylosssymmetryloss Fig. 7. An overview of our deep learning pipeline. We encode an image andget back a series of parameters defining a collection of Coons patches. Wethen compute six loss values based on the predicted patches and the groundtruth 3D model as well as a template.
The final loss that we optimize is
L({ P i } , M ) = L Ch (∪ P i M ) + α normal L normal (∪ P i M ) + α flat L flat ({ P i }) + α coll L coll ({ P i }) + α template L template ({ P i } , { T i }) + α sym L sym (∪ P i ) . (23)For models scaled to fit in a unit sphere, we use α normal = . α flat =
2, and α coll = . α template = . α sym = ×
128 raster imagesand outputs parameters defining the predicted Coons patches. Weuse an encoder-decoder architecture, consisting of a ResNet-18 Heet al. [2016] followed by three fully-connected hidden layers with1024, 512, and 256 units, respectively, and an output layer withsize equal to the appropriate output dimension. We initialize theweights of the final layer to zero with bias equal to the parametersfor the template, therefore setting the starting geometry to that ofthe template. To accept multi-view input for the tests in §4.2, weencode each input image using the ResNet encoder and performmax pooling over the latent codes. We use ReLU nonlinearity andbatch normalization after each layer except for the last. We traineach network on a single Tesla V100 GPU, using Adam [Kingmaand Ba 2014] and batch size 8 with learning rate 0.00001 when usinggeneric sphere templates and 0.0001 for category-specific templates.We train all categories for 24 hours. At each iteration, we sample7,000 points from the predicted and target shapes. Additionally, weperform train-time data augmentation by applying random crops, ro-tations, and horizontal flips to the input images. Our entire pipelineis illustrated in Figure 7.
We demonstrate the efficacy of our method by applying it to the taskof sketch-based modeling. We introduce a synthetic data generationpipeline for automatically creating realistic sketch data from 3Dmodels. We then use our pipeline to train a network that takes anatural sketch image and converts to a patch-based 3D representa-tion. We show 3D reconstruction results both on synthetic sketches • Smirnov, Bessmeltsev, and Solomon (a) (b) (c) (d)
Fig. 8. Our data generation and augmentation pipeline. Starting with a 3Dmodel (a), we use Arnold Renderer in Autodesk Maya to generate its contours(b), which we vectorize using the method of Bessmeltsev and Solomon [2019]and stochastically modify (c). We then use the pencil drawing generationmodel of Simo-Serra et al. [2018] to generate the final image (d). from our dataset as well as natural human-drawn sketches and alsoperform an ablation study, demonstrating the necessity of each termin our objective function. Finally, we compare our results to existingmethods for sketch-based as well as single-view 3D reconstruction.
While there exist annotated datasets of 3D models and correspond-ing hand-drawn sketches [Gryaditskaya et al. 2019], such data areunavailable at the scale necessary for deep learning. Thus, we in-stead generate synthetic training data from 3D models. Our systemcreates sketch-like images that capture a model from several viewsand contain the typical ambiguities and inaccuracies present inhuman-drawn sketches.Our first step is to generate 2D contours from the 3D model, whichan artist would capture in a sketch. Guided by the study by Coleet al. [2012], we render occluding contours and sharp edges usingthe Arnold Toon Shader in Autodesk Maya. We render each modelfrom a fixed number of distinct camera angles manually chosen pershape category to best capture representative views.Although the contour images capture the main features of the3D model, they lack some of the ambiguities present in rough hand-drawn sketches [Liu et al. 2018], so we augment our contour imageswith features such as broken lines. To this end, we first vectorizethe contour images using the method of Bessmeltsev and Solomon[2019]. Then, for each vectorized image, we augment the set ofcontours. With a probability of 0.3, we split a random stroke intotwo at a uniformly random position. We do this no more than 10times for a single image. Additionally, for each stroke, we truncateit at its endpoints with probability of 0.2. Finally, we introduce arealistic sketch-like texture to our contours while also adding noiseand ambiguity. For each augmented vectorized contour image, werasterize it using several different stroke widths. We then pass therasterized images through the pencil drawing generation model ofSimo-Serra et al. [2018]. We illustrate our entire data generationand augmentation pipeline in Figure 8.In the end, for each 3D model, we obtain a series of realistic,synthetically-generated sketch images. In our experiments, we trainmodels from the airplane, bathtub, guitar, bottle, car, mug, gun, andknife categories of the ShapeNet Core (v2) dataset [Chang et al.2015]. We choose these categories because they largely contain
Fig. 9. Results on synthetic sketches of airplanes. From left to right: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (76 patches). For more results, please refer to Fig. 1 models with consistent structure, making them well-suited for ourrepresentation. Prior to processing, we convert the ShapeNet modelsto watertight meshes using the method of Huang et al. [2018] andnormalize them to fit into an origin-centered unit sphere. We alsomanually remove some mislabeled models from the dataset.
We pick a random 10%-90% test-train split for each shape categoryand evaluate our method on synthetic sketches from our test datasetin Figures 1, 10, 11, 12, 13, 14, 15, and 9. For each category, we showresults from both a model using a generic 54-patch sphere templateand a category-specific template. The templates for airplanes, gui-tars, guns, knives, and cars are generated fully automatically usingsemantic segmentations of Yi et al. [2016]. For mugs, we start withan automatically-generated template and manually add a hole inthe handle as well as a void in the mug interior. To demonstrate ourmethod using a template consisting of multiple distinct parts, forcars, we use the segmentation during training, computing Chamferand normal alignment losses for wheel and body patches separately.Finally, to demonstrate the use case of our system when segmenta-tions are not available, we manually construct the bottle and bathtubtemplates simply by placing two and and five cuboids, respectively,and then running our template processing algorithm.For a generic sphere template, our method produces a compactpiecewise-smooth representation of surfaces of comparable qualityto the more conventional deformable meshes. Our algorithmic con-struction of category-specific templates, however, enables a higher-quality reconstruction of sharp features and details.In Figure 17, we show how our system is able to utilize multipleviews of the same object in order to refine its prediction. earning Manifold Patch-Based Representations of Man-Made Shapes • 9
Fig. 10. Results on synthetic sketches of bottles. From top to bottom: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (14 patches).Fig. 11. Results on synthetic sketches of bathtubs. From left to right: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (14 patches).
We also test our method on real sketches drawn by four artistsusing pencil and paper as well as an iPad with an Apple Pencil(Figure 18). Each artist was shown a rendering of a sample 3D modelrendered from each of our viewpoints and was told to sketch anobject in the same category from one of the viewpoints. The artistswere never shown the contours or synthetic sketches used in our
Fig. 12. Results on synthetic sketches of guitars. From top to bottom: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (22 patches).Fig. 13. Results on synthetic sketches of guns. From left to right: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (20 patches). training procedure. The 3D results that we recover are similar tothose on the synthetic sketches. This demonstrates that our datasetis reflective of the choices that humans make when sketching 3Dobjects.
Fig. 14. Results on synthetic sketches of knives. From top to bottom: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (14 patches).Fig. 15. Results on synthetic sketches of cars. From left to right: input sketch,3D model with sphere template (54 patches), 3D model with category-specific template (43 patches).
The representation learned by our method is naturally well-suitedfor interpolating between 3D models. Because each model is com-posed of a small number of patches, each of which is placed con-sistently across different models, we can linearly interpolate theparameters that define the patches (e.g., the vertex positions) togenerate models “between" those output by our network.We are also able to perform interpolation in the latent spacelearned by our deep model; we take the output of our our first 1024-dimensional hidden fully-connected layer to be the latent space.
Fig. 16. Results on synthetic sketches of mugs. From top to bottom: inputsketch, 3D model with sphere template (54 patches), 3D model with category-specific template (32 patches).Fig. 17. We demonstrate our method’s ability to incorporate details fromdifferent views of a model into its final prediction. We show our outputwhen given a single view of an airplane as well as the output when givenan additional view. The combined model incorporates elements not visiblein the original view.Fig. 18. Results on real human-drawn sketches of airplanes.
While the resulting interpolation is similar to that in patch space,each interpolant better resembles a realistic model due to the priorslearned by our network.We demonstrate both patch-space and latent-space interpolationbetween two car models in Figure 19.
We perform an ablation study of our method. We demonstrate onan airplane model the effect of training without each term in ourloss function as well as the difference between a category-specifictemplate, a 54-patch sphere template, and a lower resolution 24-patch template. The results are shown in Figure 20.The ablation study demonstrates the contribution of each com-ponent of our system method to the final result. Training withoutcollision detection loss results in predictions containing pairwise or earning Manifold Patch-Based Representations of Man-Made Shapes • 11
Fig. 19. Linear interpolation in learned latent space (above) and patchparameter space (below) between two car models. The consistent patchplacement and the low-dimensional, geometrically meaningful nature of ourrepresentation make it possible to interpolate directly in patch parameterspace. We obtain even better interpolants, however, when interpolating inthe 1024-dimensional latent space learned by our model; each model in thelatent space interpolation appears to be a valid car. (a) (b) (c) (d)(e) (g) (h) (i)
Fig. 20. An ablation study of our algorithm, training the network (a) withoutthe normal alignment loss, (b) without collision detection loss, (c) withoutpatch flatness loss, (d) without template normal loss (e), without symmetryloss (f), as well as using 24-patch (g) and 54-patch (h) sphere templatescompared to the final result (i). self-intersections. Omitting the normal loss causes the 3D surfaceto suffer in smoothness. Patch flatness and and template normallosses encourage patch seams to align to sharp features. While bothsphere templates capture the geometry, using more patches allowsto capture greater details, and using a non-generic template furtherimproves the model.
In Figure 21, we compare our method to the sketch-based 3D re-construction methods of Lun et al. [2017] and Delanoy et al. [2018].Our comparisons are generated using the species of input used totrain these two methods, rather than attempting to re-train theirmodels for our input.Although we train on a different dataset, the visual quality andfidelity of our predictions is comparable to the output of [Lun et al.2017] and [Delanoy et al. 2018]. Moreover, our method offers somedistinct advantages. In particular, we output a 3D representationthat sparsely captures smooth and sharp features, independent ofresolution. In contrast, Delanoy et al. [2018] produce a 64 voxelgrid—a dense representation at a fixed resolution, which cannot beedited directly and offers no topological guarantees. In Figure 22,we show results of their system evaluated on contours from ourdataset. These inputs were not processed with the pencil sketchmodel, to more closely resemble the data used to train their system.We show their results (orange) on two inputs alongside our results(blue). These results largely demonstrate that our task of reconstruct-ing sketches with a prior on class (airplane) rather than geometricstructure (cylinders and cuboids) is misaligned with theirs: Sinceour training data is not well-approximated by CSG models, theirmethod is unable to extract meaningful output. (b)(d)(b)(d)(a)(c)(a)(a)(c) Fig. 21. Compared to the previous approaches, [Delanoy et al. 2018] (a) and[Lun et al. 2017] (c), our model (b and d) captures qualitative aspects of theinput images despite having been trained on data generated from different3D models and rendered using a distinct pipeline. See Figure 22 for modelsproduced by the method of Delanoy et al. [2018] on our data. Furthermore,unlike voxel-based [Delanoy et al. 2018] or smooth mesh-based [Lun et al.2017] approaches, our models do not depend on resolution and can representsharp and smooth regions explicitly.Fig. 22. Comparison to [Delanoy et al. 2018] on inputs from our dataset.Their predictions (generated by the authors) are in orange, and ours are inblue. This experiment demonstrates that their method does not generalizeto arbitrary single-view sketches.
Although the method of Lun et al. [2017] ultimately produces amesh, it is only after a computationally expensive post-processingand fine-tuning procedure, since a forward pass through their net-work returns a labeled point cloud from which the mesh is extracted.Our method directly outputs the parameters for surface patches withno further optimization or post-processing. Additionally, the finalmesh from their technique contains more components (triangles)than our output representation (patches), making it less useful forediting. Finally, their fine-tuning approach is fundamentally incom-patible with the goal of parsing human-drawn sketches, since theyrely on propagating changes to the 3D mesh back to the raster im-age. The inherent ambiguity and noise of our input precludes thisprocedure.In Figure 23, we compare our method to AtlasNet [Groueix et al.2018]. Since AtlasNet does not operate on sketch-based input, weretrain our model with the renderings used for AtlasNet. We usethe generic 54-face sphere template for fair comparison. While our3D reconstructions capture the same amount of detail, they do notsuffer from the topological defects of AtlasNet’s representation. Inparticular, AtlasNet’s reconstruction contains many patch intersec-tions as well as holes in the surface. Extracting a watertight meshwould require significant post-processing. Additionally, each patchin our representation is parameterized sparsely by control points onits boundary. This is in contrast to AtlasNet’s patches, which come (b) (c)(a)
Fig. 23. 3D reconstructions using AtlasNet [Groueix et al. 2018] (b) andour method (c) given a single rendering as input (a). Compared to Atlas-Net, we produce a result without topological defects (holes and overlaps).Additionally, each of our patch primitives is easily editable and has a lowdimensional, interpretable parameterization. from a learned latent space and, therefore, must be sampled using adeep decoder network and cannot be easily edited.In Pixel2Mesh. Wang et al. [2018b] output a triangle mesh givena rendering as input. We train models using our method for the carcategory, using the same renders and an identical test-train split asPixel2Mesh. For fair comparison, we use our generic 54-face spheretemplate; Pixel2Mesh also initializes its output with a sphere mesh.While the final output of Pixel2Mesh is a mesh containing 2466vertices, which corresponds to , we out-put 54 patches, corresponding to
816 degrees of freedom , makingour representation better suited for editability and interpretability(Figure 2.) As shown in Figure 24, the low dimensionality of our 3Dmodels is not at the expense of expressiveness.We compare to Pixel2Mesh quantitatively in Table 1. We select2500 random test set views and compute Chamfer distance using5000 sampled points. We rescale the Pixel2Mesh meshes to be thesame size as our meshes for the comparison. While we are able toobtain comparable Chamfer distance values, our representation issignificantly more compact, editable, and less prone to non-manifoldartifacts. Category CD DOFP2M ours P2M oursairplane 0.022 0.025 7398 816car 0.018 0.022 7398 816
Table 1. Quantitative comparison to Pixel2Mesh [Wang et al. 2018b]. Forthe airplane and car ShapeNet categories, we report Chamfer distance (CD)and degrees of freedom in the representation (DOF). Although we obtaincomparable Chamfer distance, we do so using a representation that is anorder of magnitude more compact and without non-manifold artifacts.
As more and more 3D data becomes readily available, the need forprocessing, modifying, and generating 3D models in a usable fashionalso increases. While the quality of results produced by deep learningsystems continues to improve, it is necessary to think carefullyabout their format, particularly with respect to existing applicationsand use cases. By carefully designing representations together withcompatible learning algorithms, we can truly harness all these datafor the purpose of simplifying and automating workflows in design,modeling, and manufacturing.While many difficult problems remain on the path toward thisgoal, our system represents a significant step toward practical 3D
Pixel2Mesh OursInput Pixel2Mesh OursInput
Fig. 24. Comparison to Pixel2Mesh [Wang et al. 2018b] on four test setimages from each of the car and airplane categories. From left to rightin each column: input image, Wang et al. [2018b], ours trained on thegeneric 54-patch sphere template. While we are able to capture a similardegree of geometric detail in our 3D models, the dimensionality of ourpatch-based representation is an order of magnitude smaller than the mesh-based representation of Pixel2Mesh, and our results do not suffer fromnon-manifold artifacts. modeling assisted by deep learning. Our use of a sparse patch-basedrepresentation is closer to what is used in artistic and engineeringpractice, and we accompany this representation with new geometricregularizers that greatly improve the reconstruction process. Unlikemeshes or voxel occupancy functions, this representation can easilybe edited and tuned after 3D reconstruction, and it captures a trade-off between smoothness and sharp edges reasonable for man-madeshapes. Furthermore, our synthetic sketch data generation pipelinefills a gap in data sets needed to train modern machine learningsystems for this task.Our work suggests several avenues for future research. Currentlyour technique uses pre-trained networks to generate sketch trainingdata; inspired by recent generative adversarial networks (GAN), wecould couple together training of these different pieces to allevi-ate dependence on matched sketch–3D model pairs. We also couldexplore coupling with other representations, leveraging the richliterature in computer-aided geometric design (CAGD) to identifyother structures amenable to learning with relatively few parame-ters. Of particular interest are multiresolution representations (e.g.,subdivision surfaces), which might enable the system to learn bothhigh-level smooth structure as well as geometric details like filigreeindependently. It also may be beneficial to incorporate additionalmodalities such as photographs to further regularize our learnedoutput.Other extensions of our work might be oriented toward the enduser. Capturing and learning from the sequence of strokes might befruitful for disambiguating depth information in 3D reconstruction.Furthermore, we should close the loop between learning system andartist, allowing the artist to edit the 3D model or to edit the sketchand have the changes propagate to the other side.Perhaps the most important challenge remaining from our work—and others, such as [Kanazawa et al. 2018; Smirnov et al. 2020; Wanget al. 2019]—involves inference of the topology of a shape. Whileour current per-class templates support structural variability andmodular parts, scaling this towards a completely learned topologyis nontrivial. Although this limitation is reasonable for the classes earning Manifold Patch-Based Representations of Man-Made Shapes • 13 of shapes we consider—and likely for parts of shapes, as explored in[Mo et al. 2019]—reconstruction of a sketch of a generic full shapewill require algorithms that automatically add and connect patchesin a flexible and adaptive fashion.Even without the improvements above, our system remains aneffective means of 3D shape recovery. It can be used as-is as a meansof extracting an initial 3D model that can be tuned by an artistor engineer. Moreover, our architecture and loss functions can beincorporated as building blocks into larger pipelines connectingartistic imagery to the 3D world. We acknowledge the generous support of Army Research Officegrant W911NF-12-R-0011, of National Science Foundation grantIIS-1838071, from an Amazon Research Award, from the MIT-IBMWatson AI Laboratory, from the Toyota-CSAIL Joint Research Cen-ter, from the Skoltech-MIT Next Generation Program, and of a giftfrom Adobe Systems. This work was also supported by the NationalScience Foundation Graduate Research Fellowship under Grant No.1122374. We acknowledge the support of the Natural Sciences andEngineering Research Council of Canada (NSERC) grant RGPIN-2019-05097 (“Creating Virtual Shapes via Intuitive Input") and fromthe Fonds de recherche du Québec - Nature et technologies (FRQNT)grant 2020-NC-270087.
REFERENCES
Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas J Guibas. 2017.Learning Representations and Generative Models For 3D Point Clouds. arXivpreprint arXiv:1707.02392 (2017).Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, and Yaser Sheikh. 2018.Modeling Facial Geometry Using Compositional VAEs. In
The IEEE Conference onComputer Vision and Pattern Recognition (CVPR) .Pierre Baque, Edoardo Remelli, François Fleuret, and Pascal Fua. 2018. Geodesic convo-lutional shape optimization. arXiv preprint arXiv:1802.04016 (2018).Heli Ben-Hamu, Haggai Maron, Itay Kezurer, Gal Avineri, and Yaron Lipman. 2018.Multi-chart Generative Surface Modeling.
ACM Trans. Graph.
37, 6, Article 215 (Dec.2018), 15 pages. https://doi.org/10.1145/3272127.3275052Mikhail Bessmeltsev, Will Chang, Nicholas Vining, Alla Sheffer, and Karan Singh. 2015.Modeling Character Canvases from Cartoon Drawings.
ACM Trans. Graph.
34, 5,Article 162 (Nov. 2015), 16 pages. https://doi.org/10.1145/2801134Mikhail Bessmeltsev and Justin Solomon. 2019. Vectorizing Line Drawings via Polyvec-tor Fields.
ACM Transactions on Graphics
38, 1 (2019).Mikhail Bessmeltsev, Nicholas Vining, and Alla Sheffer. 2016. Gesture3D: Posing 3DCharacters via Gesture Drawings.
ACM Transactions on Graphics
35, 6 (2016),165:1–165:13. https://doi.org/10.1145/2980179.2980240Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang,Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi,and Fisher Yu. 2015.
ShapeNet: An Information-Rich 3D Model Repository . TechnicalReport arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University —Toyota Technological Institute at Chicago.Tao Chen, Zhe Zhu, Ariel Shamir, Shi-Min Hu, and Daniel Cohen-Or. 2013. 3-Sweep.
ACM Transactions on Graphics
32, 6 (2013), 1–10. https://doi.org/10.1145/2508363.2508378Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shapemodeling. In
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition . 5939–5948.Joseph Jacob Cherlin, Faramarz Samavati, Mario Costa Sousa, and Joaquim A. Jorge.2005. Sketch-based modeling with few strokes.
Proceedings of the 21st springconference on Computer graphics - SCCG ’05
1, 212 (2005), 137. https://doi.org/10.1145/1090122.1090145Christopher B. Choy, Danfei Xu, Jun Young Gwak, Kevin Chen, and Silvio Savarese. 2016.3D-R2N2: A unified approach for single and multi-view 3D object reconstruction.
Lecture Notes in Computer Science (including subseries Lecture Notes in ArtificialIntelligence and Lecture Notes in Bioinformatics)
Commun. ACM
55, 1 (2012), 107. https://doi.org/10.1145/2063176.2063202S. A. Coons. 1967.
Surfaces for Computer-Aided Design of Space Forms . Technical Report.Cambridge, MA, USA.Johanna Delanoy, Mathieu Aubry, Phillip Isola, Alexei A Efros, and Adrien Bousseau.2018. 3d sketching using multi-view deep volumetric prediction.
Proceedings of theACM on Computer Graphics and Interactive Techniques
1, 1 (2018), 1–22.Chao Ding and Ligang Liu. 2016. A Survey of Sketch Based Modeling Systems.
Front.Comput. Sci.
10, 6 (Dec. 2016), 985–999. https://doi.org/10.1007/s11704-016-5422-9Even Entem, Loic Barthe, Marie-Paule Cani, Frederic Cordier, and Michiel van de Panne.2015. Modeling 3D Animals from a Side-view Sketch.
Comput. Graph.
46, C (Feb.2015), 221–230. https://doi.org/10.1016/j.cag.2014.09.037Haoqiang Fan, Hao Su, and Leonidas Guibas. 2017. DeepPointSet : A Point Set Genera-tion Network for 3D Object Reconstruction from a Single Image. In
CVPR .Gerald Farin. 2002.
Curves and Surfaces for CAGD: A Practical Guide (5th ed.). MorganKaufmann Publishers Inc., San Francisco, CA, USA.Jun Gao, Chengcheng Tang, Vignesh Ganapathi-Subramanian, Jiahui Huang, Hao Su,and Leonidas J. Guibas. 2019. DeepSpline: Data-Driven Reconstruction of ParametricCurves and Surfaces. (2019), 1–13. arXiv:1901.03781 http://arxiv.org/abs/1901.03781Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, andThomas Funkhouser. 2019. Learning shape templates with structured implicitfunctions. In
Proceedings of the IEEE International Conference on Computer Vision .7154–7164.Yotam Gingold, Takeo Igarashi, and Denis Zorin. 2009. Structured annotations for2D-to-3D modeling.
ACM Transactions on Graphics
28, 5 (2009), 1. https://doi.org/10.1145/1618452.1618494Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and MathieuAubry. 2018. AtlasNet: A Papier-Mache Approach to Learning 3D Surface Generation.
Proceedings of the IEEE Computer Society Conference on Computer Vision and PatternRecognition (2018), 216–224. https://doi.org/10.1109/CVPR.2018.00030Yulia Gryaditskaya, Mark Sypesteyn, Jan Willem Hoftijzer, Sylvia Pont, Frédo Durand,and Adrien Bousseau. 2019. OpenSketch: A Richly-Annotated Dataset of ProductDesign Sketches.
ACM Transactions on Graphics (Proc. SIGGRAPH Asia)
38 (11 2019).Niv Haim, Nimrod Segol, Heli Ben-Hamu, Haggai Maron, and Yaron Lipman. 2019. Sur-face Networks via General Covers. In
Proceedings of the IEEE International Conferenceon Computer Vision . 632–641.Christian Häne, Shubham Tulsiani, and Jitendra Malik. 2019. Hierarchical SurfacePrediction.
TPAMI (2019).Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learningfor image recognition. In
Proceedings of the IEEE conference on computer vision andpattern recognition . 770–778.H. Huang, E. Kalogerakis, E. Yumer, and R. Mech. 2017. Shape Synthesis fromSketches via Procedural Models and Convolutional Networks.
IEEE Transac-tions on Visualization and Computer Graphics
23, 8 (Aug 2017), 2003–2013. https://doi.org/10.1109/TVCG.2016.2597830Jingwei Huang, Hao Su, and Leonidas Guibas. 2018. Robust Watertight Manifold SurfaceGeneration Method for ShapeNet Models. arXiv preprint arXiv:1802.01698 (2018).Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. 1999. Teddy: A SketchingInterface for 3D Freeform Design. In
Proceedings of the 26th Annual Conference onComputer Graphics and Interactive Techniques (SIGGRAPH ’99) . ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 409–416. https://doi.org/10.1145/311535.311602Anil K. Jain, Yu Zhong, and Marie-Pierre Dubuisson-Jolly. 1998. Deformable TemplateModels: A Review.
Signal Process.
71, 2 (Dec. 1998), 109–129. https://doi.org/10.1016/S0165-1684(98)00139-XA Jung, S Hahmann, D Rohmer, A Begault, L Boissieux, and M P Cani. 2015. SketchingFolds: Developable Surfaces from Non-Planar Silhouettes.
Acm Transactions onGraphics
34, 5 (2015), 12. https://doi.org/10.1145/2749458Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Learn-ing Category-Specific Mesh Reconstruction from Image Collections. In
ECCV .Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3D Mesh Renderer.In
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Venkat Krishnamurthy and Marc Levoy. 1996. Fitting Smooth Surfaces to Dense PolygonMeshes. In
Proceedings of the 23rd Annual Conference on Computer Graphics andInteractive Techniques (SIGGRAPH âĂŹ96) . Association for Computing Machinery,New York, NY, USA, 313âĂŞ324. https://doi.org/10.1145/237170.237270Changjian Li, Hao Pan, Yang Liu, Xin Tong, Alla Sheffer, and Wenping Wang. 2018.Robust Flow-Guided Neural Prediction for Sketch-Based Freeform Surface Modeling.
ACM Trans. Graph.
37, 6, Article Article 238 (Dec. 2018), 12 pages. https://doi.org/10.1145/3272127.3275051Changjian Li, Hao Pan, Xin Tong, Alla Sheffer, and Wenping Wang. 2017. BendSketch :Modeling Freeform Surfaces Through 2D Sketching.
ACM Trans. Graph
36, 4 (2017).Minchen Li, Alla Sheffer, Eitan Grinspun, and Nicholas Vining. 2018. Foldsketch:enriching garments with physically reproducible folds. {ACM} Trans. Graph.
37, 4(2018), 133:1—-133:13.
Or Litany, Alex Bronstein, Michael Bronstein, and Ameesh Makadia. 2018. DeformableShape Completion with Graph Convolutional Autoencoders.
CVPR (2018).Chenxi Liu, Enrique Rosales, and Alla Sheffer. 2018. StrokeAggregator: ConsolidatingRaw Sketches into Artist-Intended Curve Drawings.
ACM Transaction on Graphics
37, 4 (2018). https://doi.org/10.1145/3197517.3201314M. Liu, O. Tuzel, A. Veeraraghavan, and R. Chellappa. 2010. Fast directional chamfermatching. In . 1696–1703. https://doi.org/10.1109/CVPR.2010.5539837Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, and RuiWang. 2017. 3D Shape Reconstruction from Sketches via Multi-view ConvolutionalNetworks. In .Priyanka Mandikal, K L Navaneet, Mayank Agarwal, and R Venkatesh Babu. 2018.3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point CloudReconstruction from a Single Image. In
Proceedings of the British Machine VisionConference (BMVC) .Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer,Vladimir G. Kim, and Yaron Lipman. 2017. Convolutional Neural Networks onSurfaces via Seamless Toric Covers.
ACM Trans. Graph.
36, 4, Article 71 (July 2017),10 pages. https://doi.org/10.1145/3072959.3073616Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and AndreasGeiger. 2019. Occupancy Networks: Learning 3D Reconstruction in Function Space.In
Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) .Kaichun Mo, Shilin Zhu, Angel Chang, Li Yi, Subarna Tripathi, Leonidas Guibas, and HaoSu. 2019. PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding. In
Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR) .Andrew Nealen, Takeo Igarashi, Olga Sorkine, and Marc Alexa. 2007. FiberMesh:Designing Freeform Surfaces with 3D Curves.
ACM Transactions on Graphics(Proceedings of ACM SIGGRAPH)
26, 3 (2007), article no. 41.Gen Nishida, Ignacio Garcia-Dorado, Daniel G. Aliaga, Bedrich Benes, and AdrienBousseau. 2016. Interactive Sketching of Urban Procedural Models.
ACM Trans.Graph.
35, 4 (2016).Luke Olsen, Faramarz F. Samavati, Mario Costa Sousa, and Joaquim A. Jorge. 2009.Sketch-based modeling: A survey.
Computers & Graphics
33, 1 (feb 2009), 85–103.https://doi.org/10.1016/j.cag.2008.09.013Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love-grove. 2019. DeepSDF: Learning Continuous Signed Distance Functions for ShapeRepresentation. In
The IEEE Conference on Computer Vision and Pattern Recognition(CVPR) .Les Piegl and Wayne Tiller. 1996.
The NURBS Book (second ed.). Springer-Verlag, NewYork, NY, USA.Danilo Jimenez Rezende, SM Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jader-berg, and Nicolas Heess. 2016. Unsupervised learning of 3d structure from images.In
Advances in neural information processing systems . 4996–5004.C. Robson, R. Maharik, A. Sheffer, and N. Carr. 2011. Context-Aware Garment Modelingfrom Sketches.
Computers and Graphics (Proc. SMI 2011) (2011), 604–613.Tianjia Shao, Dongping Li, Yuliang Rong, Changxi Zheng, and Kun Zhou. 2016. DynamicFurniture Modeling Through Assembly Instructions.
ACM Transactions on Graphics(SIGGRAPH Asia 2016)
35, 6 (2016).A. Shtof, A. Agathos, Y. Gingold, A. Shamir, and D. Cohen-Or. 2013. GeosemanticSnapping for Sketch-Based Modeling.
Computer Graphics Forum
32, 2, pt. 2 (may2013), 245–253. https://doi.org/10.1111/cgf.12044Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. 2018. Mastering Sketching:Adversarial Augmentation for Structured Prediction.
Transactions on Graphics(Presented at SIGGRAPH)
37, 1 (2018).Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep Learning 3D Shape SurfacesUsing Geometry Images. In
ECCV .Dmitriy Smirnov, Matthew Fisher, Vladimir G. Kim, Richard Zhang, and Justin Solomon.2020. Deep Parametric Shape Predictions using Distance Fields. In
Conference onComputer Vision and Pattern Recognition (CVPR) .Chunyu Sun, Qianfang Zou, Xin Tong, and Yang Liu. 2019. Learning Adaptive Hierar-chical Cuboid Abstractions of 3D Shape Collections.
ACM Transactions on Graphics(SIGGRAPH Asia)
38, 6 (2019).Chiew-Lan Tai, Hongxin Zhang, and Jacky Chun-Kin Fong. 2004. Prototype Modelingfrom Sketched Silhouettes based on Convolution Surfaces.
Computer Graphics Forum (2004). https://doi.org/10.1111/j.1467-8659.2004.00006.xM. Tatarchenko, A. Dosovitskiy, and T. Brox. 2016. Multi-view 3D Models from SingleImages with a Convolutional Network. In
European Conference on Computer Vision(ECCV) .Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B.Tenenbaum, and Jiajun Wu. 2019. Learning to Infer and Execute 3D Shape Programs.In
International Conference on Learning Representations .Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. 2018. Multi-view consistencyas supervisory signal for learning shape and pose prediction. In
Proceedings of theIEEE conference on computer vision and pattern recognition . 2897–2905. Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, and Jitendra Malik. 2017a.Learning Shape Abstractions by Assembling Volumetric Primitives. In
ComputerVision and Pattern Regognition (CVPR) .Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, and Jitendra Malik. 2017b.Learning Shape Abstractions by Assembling Volumetric Primitives. In
ComputerVision and Pattern Regognition (CVPR) .Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017c. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency.(2017). https://doi.org/10.1109/CVPR.2017.30 arXiv:1704.06254Emmanuel Turquin, Marie-Paule Cani, and John F. Hughes. 2004. Sketching Garmentsfor Virtual Characters. In
Proceedings of the First Eurographics Conference on Sketch-Based Interfaces and Modeling (SBM’04) . Eurographics Association, Aire-la-Ville,Switzerland, Switzerland, 175–182. https://doi.org/10.2312/SBM/SBM04/175-182Lingjing Wang, Jifei Wang, Cheng Qian, and Yi Fang. Unsupervised learning of 3Dmodel reconstruction from hand-drawn sketches. In
MM 2018 - Proceedings of the2018 ACM Multimedia Conference (MM 2018 - Proceedings of the 2018 ACM MultimediaConference) . Association for Computing Machinery, Inc, 1820–1828. https://doi.org/10.1145/3240508.3240699 26th ACM Multimedia conference, MM 2018 ; Conferencedate: 22-10-2018 Through 26-10-2018.Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang.2018b. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In
ECCV .Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis.
ACMTransactions on Graphics (SIGGRAPH)
36, 4 (2017).Shaoxiong Wang, Jiajun Wu, Xingyuan Sun, Wenzhen Yuan, William T Freeman,Joshua B Tenenbaum, and Edward H Adelson. 2018a. 3D Shape Perception fromMonocular Vision, Touch, and Shape Priors. In
IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS) .Weiyue Wang, Duygu Ceylan, Radomir Mech, and Ulrich Neumann. 2019. 3DN: 3DDeformation Network. In
CVPR .Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, and DanielePanozzo. 2019. Deep geometric prior for surface reconstruction. In
Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition . 10130–10139.Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Freeman, and Joshua BTenenbaum. 2017. MarrNet: 3D Shape Reconstruction via 2.5D Sketches. In
AdvancesIn Neural Information Processing Systems .Jiajun Wu, Tianfan Xue, Joseph J Lim, Yuandong Tian, Joshua B Tenenbaum, AntonioTorralba, and William T Freeman. 2016a. Single Image 3D Interpreter Network. In
European Conference on Computer Vision (ECCV) .Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T Freeman, and Joshua B Tenenbaum.2016b. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In
Advances in Neural Information Processing Systems . 82–90.Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T Freeman,and Joshua B Tenenbaum. 2018. Learning 3D Shape Priors for Shape Completionand Reconstruction. In
European Conference on Computer Vision (ECCV) .Baoxuan Xu, William Chang, Alla Sheffer, Adrien Bousseau, James McCrae, and KaranSingh. 2014. True2Form.
ACM Transactions on Graphics
33, 4 (2014), 1–13. https://doi.org/10.1145/2601097.2601128Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. 2016. Perspec-tive transformer nets: Learning single-view 3d object reconstruction without 3dsupervision. In
Advances in neural information processing systems . 1696–1704.Linjie Yang, Jianzhuang Liu, and Xiaoou Tang. 2013. Complex 3D general objectreconstruction from line drawings.
Proceedings of the IEEE International Conferenceon Computer Vision (2013), 1433–1440. https://doi.org/10.1109/ICCV.2013.181Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. 2018. FoldingNet: Point CloudAuto-Encoder via Deep Grid Deformation. In
The IEEE Conference on ComputerVision and Pattern Recognition (CVPR) .Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, CewuLu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. 2016. A Scalable ActiveFramework for Region Annotation in 3D Shape Collections.
SIGGRAPH Asia (2016).Mehmet Ersin Yumer and Levent Burak Kara. 2012. Surface creation on unstructuredpoint sets using neural networks.
Computer-Aided Design
44, 7 (2012), 644 – 656.https://doi.org/10.1016/j.cad.2012.03.002Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B Tenenbaum, William TFreeman, and Jiajun Wu. 2018. Learning to Reconstruct Shapes from Unseen Classes.In
Advances in Neural Information Processing Systems (NeurIPS) .Zhirong Wu, S. Song, A. Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and J.Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In . 1912–1920.https://doi.org/10.1109/CVPR.2015.7298801L. Zhu, T. Igarashi, and J. Mitani. 2013. Soft Folding.
Computer Graph-ics Forum
32, 7 (2013), 167–176. https://doi.org/10.1111/cgf.12224arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12224Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, and Derek Hoiem. 2017. 3d-prnn: Generating shape primitives with recurrent neural networks. In