[PDF] Sketch2CAD: Sequential CAD Modeling by Sketching in Context

Abstract

We present a sketch-based CAD modeling system, where users create objects incrementally by sketching the desired shape edits, which our system automatically translates to CAD operations. Our approach is motivated by the close similarities between the steps industrial designers follow to draw 3D shapes, and the operations CAD modeling systems offer to create similar shapes. To overcome the strong ambiguity with parsing 2D sketches, we observe that in a sketching sequence, each step makes sense and can be interpreted in the \emph{context} of what has been drawn before. In our system, this context corresponds to a partial CAD model, inferred in the previous steps, which we feed along with the input sketch to a deep neural network in charge of interpreting how the model should be modified by that sketch. Our deep network architecture then recognizes the intended CAD operation and segments the sketch accordingly, such that a subsequent optimization estimates the parameters of the operation that best fit the segmented sketch strokes. Since there exists no datasets of paired sketching and CAD modeling sequences, we train our system by generating synthetic sequences of CAD operations that we render as line drawings. We present a proof of concept realization of our algorithm supporting four frequently used CAD operations. Using our system, participants are able to quickly model a large and diverse set of objects, demonstrating Sketch2CAD to be an alternate way of interacting with current CAD modeling systems.

Full PDF

SSketch2CAD: Sequential CAD Modeling by Sketching in Context

CHANGJIAN LI,

University College London

HAO PAN,

Microsoft Research Asia

ADRIEN BOUSSEAU,

Inria, Université Côte d’Azur

NILOY J. MITRA,

University College London and Adobe Research

AddPolyhedron: < plane 3 , length 0 . > BevelCorner: < plane 3 , corner 3 > BevelCorner: < plane 3 , corner 2 > BevelCorner: < plane 3 , corner 0 > BevelCorner: < plane 3 , corner 1 > AddSweepShape: < plane 4 , length 0 . > AddSweepShape: < plane 4 , length 0 . > AddSweepShape: < plane 1 , length 0 . > SubtractPolyhedron: < plane 12 , length 0 . > SubtractPolyhedron: < plane 13 , length 0 . > (a) inspirational sketch (b) sketching sequence using Sketch2CAD (c) inferred CAD instructions (d) CAD model step 1 step 2step 6 step 8 step 9 Fig. 1. Industrial designers commonly decompose complex shapes into box-like primitives, which they refine by drawing cuts and roundings, or by adding andsubstracting smaller parts [Eissen and Steur 2008, 2011] (a, ©Koos Eissen and Roselien Steur). Users of

Sketch2CAD follow similar sketching steps (b), whichour system interprets as parametric modeling operations (c) to automatically output a precise, compact, and editable CAD model (d).

We present a sketch-based CAD modeling system, where users create ob-jects incrementally by sketching the desired shape edits, which our systemautomatically translates to CAD operations. Our approach is motivated bythe close similarities between the steps industrial designers follow to draw3D shapes, and the operations CAD modeling systems offer to create similarshapes. To overcome the strong ambiguity with parsing 2D sketches, weobserve that in a sketching sequence, each step makes sense and can beinterpreted in the context of what has been drawn before. In our system,this context corresponds to a partial CAD model, inferred in the previoussteps, which we feed along with the input sketch to a deep neural networkin charge of interpreting how the model should be modified by that sketch.Our deep network architecture then recognizes the intended CAD operationand segments the sketch accordingly, such that a subsequent optimizationestimates the parameters of the operation that best fit the segmented sketchstrokes. Since there exists no datasets of paired sketching and CAD mod-eling sequences, we train our system by generating synthetic sequencesof CAD operations that we render as line drawings. We present a proof ofconcept realization of our algorithm supporting four frequently used CADoperations. Using our system, participants are able to quickly model a largeand diverse set of objects, demonstrating Sketch2CAD to be an alternateway of interacting with current CAD modeling systems.

Authors’ addresses: Changjian Li, University College London, 66-72 Gower Street,London, [email protected]; Hao Pan, Microsoft Research Asia, No.5 Danling Rd,Beijing, [email protected]; Adrien Bousseau, Inria, Université Côte d’Azur, 2004route des lucioles, Valbonne, [email protected]; Niloy J. Mitra, University Col-lege London, 66-72 Gower Street, London , Adobe Research, [email protected] to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.0730-0301/2020/12-ART164 $15.00https://doi.org/10.1145/3414685.3417807

CCS Concepts: •

Computing methodologies → Shape modeling .Additional Key Words and Phrases: sketch, CAD modeling, procedural mod-eling, convolutional neural network

ACM Reference Format:

Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J. Mitra. 2020. Sketch2CAD:Sequential CAD Modeling by Sketching in Context.

ACM Trans. Graph.

Sketching and 3D modeling are two major steps of industrial design.Sketching is typically done first, as it allows designers to expresstheir vision quickly and approximately [Eissen and Steur 2008].Design sketches are then converted to 3D models for downstreamengineering and manufacturing, using CAD tools that offer highprecision and editability [Pipes 2007]. However, design sketchingand CAD modeling are often performed by different experts withdifferent skill sets, making design iterations cumbersome, expensive,and time consuming.While a number of methods have been proposed to create 3Dmodels by sketching, existing solutions often lack the precision andeditability of CAD modeling. On the one hand, interactive systemsinterpret user strokes as custom modeling operations rather thangeneric CAD [Bae et al. 2008; Igarashi et al. 1999; Nishida et al. 2016;Zeleznik et al. 1996]. On the other hand, methods that interpretcomplete sketches are limited to specific drawing techniques [Xuet al. 2014] and classes of shapes [Lun et al. 2017], and output curvenetworks or triangular meshes rather than editable models. Asstressed by a recent survey [Bonnici et al. 2019], to be widely used bythe design industry, “sketch-based modeling systems should integrateseamlessly with existing workflow practices” . ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. a r X i v : . [ c s . G R ] S e p (a)(b)create box add cylinder bevel box smooth edge subtract cylinder Fig. 2.

Similarity between sketching and CAD modeling workflows. (a) As illustrated by this sequence from the

OpenSketch dataset [Gryaditskayaet al. 2019], industrial designers construct their drawings by starting from a simple shape (here a box) that they refine by adding or subtracting sub-parts(wheels, beveled edges, hole). (b) Modern CAD software such as SketchUP [Trimble 2019] rely on very similar operations to model 3D shapes.

Our key observation is that despite their apparent differences,design sketching and CAD modeling actually involve very similarworkflows, yet expressed in different languages. Industrial design-ers often start their sketches by drawing the overall shape as anassemblage of boxes and cylinders called scaffolds , which they thenrefine by drawing roundings, sub-parts, and small details (Fig. 2a).Similarly, CAD modelers often start with simple geometric primi-tives that they refine to build up complex models by progressivelyapplying geometric operations (e.g., extrude, bevel, smooth) (Fig. 2b).Based on this observation, we propose Sketch2CAD as a learning-based interactive modeling system that translates sketching opera-tions into their corresponding CAD modeling operations. Users ofour system thus express their ideas using similar sketching steps asthey would do on paper, yet obtain as output a regular CAD model,along with a trace of the sequential operations, ready to be fabri-cated or further edited with existing CAD software. Our system canbe seen as a translator that interprets user drawn strokes in contextof the current modeling session, and maps them into a sequence ofpredefined CAD operations, along with their associated parameters.The system empowers users to create regular CAD models withouthaving to navigate complex CAD system interfaces.We first propose a common parameterization of popular sketchingand CAD modeling operations (e.g., extrude , bevel , add , subtract , sweep ). For each operation, our parameterization encodes the differ-ent components of the CAD shape, which correspond to differentstrokes in the sketch. For instance, a bevel operation is composedof two parallel curves that define the new profile of the corner onwhich it applies. Importantly, our parameterization also encodesthe faces of the current 3D model that should be modified by theoperation, since CAD operations are typically applied in sequenceto progressively achieve complex shapes.The main challenge is then to recover, for every step of a sketch-ing session, the intended modeling operation and the asociatedparameters. This is a highly ambiguous task, not only because thestrokes are often imprecise, but also because similar strokes mighthave different meaning depending on the context in which they aredrawn. We propose a three-stage pipeline that progressively reduces this ambiguity to produce regular CAD objects. The first stage clas-sifies the sketch among possible CAD operations ( extrude , bevel , add , subtract , sweep ). In addition to the user sketch, the classifiertakes as input depth and normal maps of the current 3D model,which provides strong contextual cues about the intended operation.The second stage segments the user sketch and contextual maps intoparts, specific to the target CAD operation. For instance, the sketchof a bevel operation is segmented into its two profile curves, whilethe contextual maps are segmented to form a mask of the face onwhich the bevel operation needs to be applied. Finally, the thirdstage instantiates the CAD operation by fitting parametric curvesor shapes on the segmented strokes and projecting these strokes,optionally regularized, on the selected faces of the 3D model.From a technical standpoint, we realize our classification andsegmentation stages with deep convolutional networks. In additionto the design of CAD-specific segmentation networks, a contributionof our work resides in a large training dataset of CAD-like objectsthat we synthetically generated by sampling sequences of CADoperations. We took special care in balancing this dataset suchthat the most complex operations appear more frequently, and thatparameters of all operations are sampled uniformly. Furthermore,we also balanced the length of the operation sequences, such thatour system can recognize CAD operations at any stage of a modelingsession.In contrast to prior learning-based methods that were trained onparticular domains [Huang et al. 2016; Nishida et al. 2016] or selectedobject classes [Lun et al. 2017], a key strength of our approach isthat it recognizes a set of existing CAD operations that can beapplied in arbitrary order, allowing the creation of a diverse rangeof human-made objects. In addition, the parametric nature of eachsuch operation results in shapes that are highly precise and regulardespite very approximate input strokes. Figure 1 shows a typicalmodeling session using Sketch2CAD. Finally, since our training isentirely synthetic, we believe that the same approach can be usedto extend to support other operations.While our sketch-based modeling system does not provide thesame level of comprehensive modeling as modern CAD software ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:3 like SketchUP [Trimble 2019] and TinkerCAD [Autodesk 2019], itdemonstrates an alternate way of interacting with existing CADsystems without requiring repeated command selection and switch-ing. Our interface can be particularly attractive to product designersor novice users who are more fluent with sketching than with CADmodeling interfaces. By allowing non-experts to quickly producecomplete CAD protocols (see Sec. 8.1), our tool holds the potentialto facilitate more direct collaboration between novices and experts.In summary, our main contributions are: • formalizing a set of common CAD operations and their cor-responding sketches, allowing an automatic translation be-tween the two domains; • developing a pipeline of deep neural networks capable ofrecognizing and segmenting CAD operations from sketchesdrawn over 3D shapes, and producing precise, regular 3Dgeometry by fitting CAD parameters on the predictions; • designing a large dataset of synthetic CAD models, alongwith their step-by-step construction sequences; and, as a cul-mination of these taken together, • presenting Sketch2CAD as a novel sketch-based modelingsystem that unifies the sequential workflows of product de-sign sketching and CAD modeling.Code, training data, and the Sketch2CAD system are available onthe project page for research use. Our work aims to bridge the gap between sketch-based and CADmodeling.

CAD modeling.

Computer-Aided Design has long been adoptedby the industry to create precise and high-quality 3D models suit-able for physical simulation, lighting simulation, and downstreammanufacturing [Autodesk 2019a,b; Robert McNeel & Associates2019; Trimble 2019]. However, the high precision offered by CADcomes at the price of complex interfaces to allow users to selectappropriate geometric operations and tune their parameters. Vari-ous approaches have been considered to reduce this user burden,from automatic alignment of existing CAD models on scanned pointclouds [Avetisyan et al. 2019], to educational visualizations of mod-eling sequences [Denning et al. 2011]. We contribute to this effortby instantiating CAD operations by sequentially interpreting hand-drawn sketches.Closer to our work are methods aiming at converting raw 3Dmeshes into editable CAD models, which can be formulated as aform of program synthesis [Du et al. 2018; Sharma et al. 2018; Tianet al. 2019]. On the one hand, leveraging the sequential nature ofsketching and CAD modeling makes our problem better posed thanthe conversion of complete objects that these methods target. Onthe other hand, we take as input approximate sketch lines ratherthan precise 3D models, which induces additional ambiguity. Elliset al. [2018] also applied program synthesis to convert sketches tographics programs, but focused on 2D diagrams and as such did notconsider depth recovery.

Sketch-based modeling.

Existing work on sketch-based modelingcan be broadly classified into offline and online methods. Offline methods aim at interpreting complete drawings, either automat-ically or with user assistance. Early algorithms detect geometricconstraints between curves, such as parallelism, orthogonality andsymmetry, and solve for the 3D curve network that best satisfiesthese constraints [Cordier et al. 2013; Lipson and Shpitalni 1996;Naya et al. 2002; Wang et al. 2009; Xu et al. 2014]. The main lim-itation of these methods is that they require clean drawings asinput to detect and enforce relevant constraints. In addition, thecurve networks they produce are not directly usable by downstream3D modeling and simulation software. These limitations are partlyaddressed by interactive tools that allow users to align geometricprimitives over the drawing [Gingold et al. 2009; Shtof et al. 2013].While the parametric nature of these primitives brings robustnessto approximate inputs, users of these systems need to provide anumber of annotations to achieve precise alignment and relativepositioning. We differ from the above methods by exploiting thecommon sequential nature of sketching and CAD modeling, whichallows us to automatically recognize parametric CAD operations assoon as they are drawn rather than during a subsequent annotationprocess.In contrast to the above optimization-based approaches, recentwork has explored the potential of deep learning to automaticallyreconstruct 3D objects from one or several sketches [Delanoy et al.2018; Li et al. 2018; Lun et al. 2017; Su et al. 2018]. Since these meth-ods build strong shape priors from training data, they are limitedto specific classes of objects [Delanoy et al. 2018; Lun et al. 2017;Su et al. 2018] or types of surfaces [Li et al. 2018]. Besides, all thesemethods predict depth maps or voxel grids that are then convertedto triangular meshes, which, in contrast to CAD models, greatlylimits the precision and editability of the resulting 3D models. Whilewe also build on powerful deep convolutional networks, a strengthof our approach is to predict CAD operations rather than completeobjects. Combining these operations allows users to produce a widevariety of parametric shapes, which are precise and editable byconstruction, and allow for generalization across different CADmodels.The sequential workflow we offer makes our method closer inspirit to online sketch-based modeling systems, where users cre-ate complex 3D shapes incrementally by alternating between 2Dsketching and 3D navigation. Because of the difficulty of recovering3D shapes from 2D strokes, a number of systems focus on specificmodeling operations, such as inflation of smooth shapes [Igarashiet al. 1999; Nealen et al. 2007] or creation of sparse networks of3D curves [Bae et al. 2008; Schmidt et al. 2009]. Closer to our workare methods that enable the creation of CAD models representingman-made shapes. Rivers et al. [2010] resolve 2D-to-3D ambiguityby asking users to draw the shape parts in three orthographic views,as common in CAD software. We instead let users draw in a singleperspective view, as common in product design sketching. The sem-inal

SKETCH system by Zeleznik et al. [1996] and

GiDES++ by Jorgeet al. [2003] include some of our CAD operations. However, usersof

SKETCH specify these operations using a custom vocabulary ofsketching gestures , while users of

GiDES++ need to decompose ob-ject parts into individual strokes interpreted one by one using a setof hand-crafted rules. Our originality is to automatically recognizethe CAD operations and recover their parameters from freehand

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020.

Type: sweep

Extrude Parameter FittingBevel Parameter FittingAdd/Sub Parameter FittingSweep Parameter Fittingface map curve map ~~~

Protocol

SubPoly: AddSweep: BevelCorner:

Protocol

SubPoly: AddSweep: BevelCorner:

SubSweep:

Existing shape and input sketch Resulting shape matching the sketch

Fig. 3.

Sketch2CAD at inference time . Given an existing shape and input sketch strokes (shown in orange) for the current operation, we first obtain themaps of sketch and local context (i.e., depth and normal), which are fed to the operator classification and segmentation networks. The classified operator type, sweep in this example, is used to select the output base face and curve segmentation maps, based on which the parameters defining the operator are fitted, viaan optimization, to recover the sketched operation instance. The recovered operator is then applied to the existing shape to produce the updated model;meanwhile, the operation is pushed into the protocol list. sketches, which allows users to directly draw complete parts of theshapes they wish to obtain, without requiring to learn a set of newgestures. We achieve this recognition using deep neural networkstrained on synthetic CAD modeling sequences. While Huang etal. [2016] and Nishida et al. [2016] explored a similar usage of deeplearning for procedural modeling, they target shapes with fixednumbers of parameters, created using a fixed order of operations.For example, Nishida et al. assume that to create a building, usersstart by sketching the building mass, then the roof, the facades, andfinally the windows. In contrast, a major challenge we face is torecognize generic CAD operations with varying number of parame-ters, sketched in any order. We achieve this goal by accounting forthe context under which the sketch is drawn. Furthermore, whileHuang et al. and Nishida et al. use a regression network to predictthe parameters of their shapes, we found this strategy to fail on themore ambiguous problem we target, and instead use a classifica-tion network to segment the sketched strokes into CAD-specificcomponents on which we subsequently fit geometric primitives.We draw inspiration from several earlier systems that exploredthe possibility to sketch novel shapes in the context of an existingscene, represented as photographs or 3D models [De Paoli and Singh2015; Favreau et al. 2015; Lau et al. 2010; Li et al. 2017; Paczkowskiet al. 2011; Xu et al. 2019; Zheng et al. 2016]. However, these methoduse the existing context to either guide user sketching or to deducegeometric constraints for lifting the sketch to 3D. Our originality isto leverage context within a sequential modeling workflow, wherethe existing scene informs the recognition of the intended CADoperation, which aims at modifying that scene.

Suppose we work with solid CAD models M : = { M ⊂ R | M = M } ,where M is the closure of M ; in the implementation, we representthe models by their boundaries as triangle meshes. We define aset of CAD modeling operators O = {O( θ , ·) : M → M } , whereeach applied operator M ′ = O( θ , M ) changes the input geometry M to a new shape M ′ and is defined by a parameter vector θ thatspecifies both the parts of M to be modified and the correspondingmodification parameters. Note that different operations can have different number and types of parameters. In the sketch-based CADmodeling, our primary goal is to interpret a sketch drawn over anexisting shape as the corresponding operator with proper param-eters that changes the shape to match the 2D sketch. The overallprocess is illustrated in Fig. 3. Formally, given the current shape M and the 2D sketch curves S = { s i } with known viewpoint v ∈ R ,we strive for the mapping Φ ( M , S , v ) = O( θ , ·) such that the imageof O( θ , M ) closely matches S when viewed according to v . We usethe orthogonal 3D to 2D projection in our approach. In the follow-ing discussion, whenever possible, we omit the parameters of anoperator for brevity.Due to the diversity and infinite variation of operators, neither thebrute-force exhaustive enumeration of all operators and parametersnor the traditional stochastic or energy based optimizations canefficiently solve the inverse problem. Instead, we approach thisproblem by using deep learning. In particular, we train a two-stageneural network that models the mapping Φ , where the first stagepredicts the operator type and the second stage segments the sketchand context maps into regions, on which the specific parametersfor the operator are fitted to instantiate the operation. The keytechnical challenges are how to design the machine learning modelsand training tasks such that the inverse mapping is feasible, learnedby the neural networks and reliably generalized to real modelinginteraction.We present the definition and parameterization of specific opera-tors in Sec. 4, the neural networks and their usage for the inversemapping in Sec. 5, and how to train the networks for reliable gener-alization in Sec. 6. In the current system we support the following four operations, i.e.face extrusion, beveling a corner, addition/subtraction of a rightpolyhedron, and sweeping a cylindrical shape (see Fig. 4 for illustra-tions). We choose the four operators because they are widely usedboth in sketching workflows and CAD modeling, and can alreadybe interleaved to generate complex shapes; nonetheless, our systemcan be easily extended to incorporate more operators as needed.To fully describe an operator O( θ , ·) , we define its parameters θ , its ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:5

Extrude param: base face f , offset d action: move f along its normal directionfor d sketch: edges of the moved f and the ex-tended side edges fd Bevel param: base face f , corner c with an oppo-site corner c ′ , profile curve l on f action: turn c and c ′ into rounded cornersspecified by l sketch: l and its offset by vector cc ′ flcc ′ Add/Subtract param: base face f , prism base curve c , pro-file length d , add/subtract option o = ± action: build a prism with base c and pro-file edge of length d in the normaldirection of f , then find the union( o = + )/difference ( o = − ) of baseshape and the prismsketch: edges of the prism f dc o =+ dco = − Sweep param: base face f , base/offset circles c , c ,profile curve c p , add/subtract option o = ± action: build a swept shape by rolling theprofile curve c p along c , c , thenfind the union ( o = + )/difference( o = − ) of base shape and the primi-tivesketch: circles, profiles of swept shape fc c p c Fig. 4.

Operators supported in Sketch2CAD.

In each inset, the parame-ters defining the operator are annotated and the corresponding sketches areshown over the existing shape, while the result of applying the respectiveoperation is shown as the updated shape. applied action M ′ = O( θ , M ) for given M , and the correspondingsketches S that a user draws to specify it. Extrude . Extrusion is the simple offset of a planar face of the3D shape along the face normal direction. As shown in Fig. 4, theparameters defining the operator are the face f to be offset and thedistance d along face normal vector for the extrusion, with d > d < Bevel . Bevel, also known as rounding [Eissen and Steur 2011]in sketching or fillet in CAD modeling, is to turn a sharp creaseof an object into a smooth and rounded connection. As shown inFig. 4, the operator is defined on the crease connecting two corners c , c ′ , with c residing on the base face f ; the sharp edge cc ′ is thenturned into a smooth connection with profile curve l that rounds c on f . The sketches corresponding to such an operator are the profilecurve l on f and its parallel obtained by translation by cc ′ . Add/Subtract . The addition or subtraction operator is to place aprimitive shape (a prism) over a base shape and compute the unionor difference of the two shapes. The parameters are used to designatethe base face f to place the primitive, and to define the primitiveshape by specifying its base curve c as one of triangle, quadrilateral,pentagon, or hexagon, as well as its profile curve that is alwaysparallel to the base face normal with length d . In addition, the option o of union (for addition) or difference (for subtraction) betweenthe base shape and the primitive is specified. The correspondingsketches are simply depicting the primitive shape by highlightingits feature curves. Sweep . Sweeping a curved profile line along two circular rails isanother commonly used operation in CAD modeling, which alsoappears frequently in industrial design sketching in the form ofhorizontal ellipses joined by a vertical section (see Fig. 4). Similar to add/subtract , the swept shape is combined with the base shapeby either union or subtraction; therefore, the sweep operation canbe seen as a special add/subtract where the primitive shape is aswept cylindrical shape. The parameters to define the sweep oper-ation consist of the base and offset circles, and the profile curvewhose two ends lie on the two circles. There is also the union anddifference option to specify the combination with base shape. Thecorresponding sketches simply show the swept shape through itstwo circular ends and a pair of profile curves. The add/subtract and sweep operators are denoted as

Add/SubtractPolyhedron and

Add/SubtractSweepShape respectively in operation sequences (seeFigs. 1, 3 and 6) for distinction.

Extension to more operators.

One can follow the above examplesto define new operators. In general, the parameters of the operatorshould be minimal but complete in defining its actions without am-biguity. The corresponding sketches should be concise and capturethe important features of the operation. All these designs will im-pact the machine learning models used for recovering the operatorinstance from sketches, as discussed later in Sec. 5.

Protocols for CAD modeling.

A protocol file is a serialization ofthe modeling steps. It consists of the full set of parameters specify-ing the operators that are applied in sequence to obtain the finalshape. A protocol can be saved, loaded, edited, and reused for morecomplex modeling tasks. Illustrations of a protocol as the sequenceof operations it contains are shown in Figs. 1, 3, 6. More protocoltexts for generating models shown in Fig. 12 can be found in thesupplemental material.

Implementation of operator actions.

In our current implementa-tion, we represent the 3D solid models by their boundaries as trianglemeshes, but always maintain a set of planar polygonal faces that aremade of adjacent triangles of coplanarity, by flooding across meshedges with tight dihedral angle thresholding ( < ◦ ). When applyingany of the operators defined above which require planar bases, thebase face is selected as one of the planar polygons, and the actionis carried out by computing the appropriate Boolean operation be-tween the sketched primitive and the base mesh, using CGAL [The ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020.

CGAL Project 2020]. While for extrude,add/subtract,sweep thesketched primitives are clear, for bevel , we construct a prism whosebase face is defined by connecting c and l and whose profile edgeis cc ′ (Fig. 4), and subtract it from the base shape. In the future,when extending to operators applied on curved faces, we considerupgrading our underlying geometry representation to more flexibleones, e.g. NURBS (Sec. 8.4). After sketching an operation in the modeling session, there arethree steps taken to interpret the current sketch S : the recognitionof the operator type O by an classification network, the extractionof the individual regions from the input maps by the segmentationnetwork for the operator type O , and the recovery of parameters θ defining the specific instance O( θ , ·) by counting and curve fitting.Finally, the regressed operator is applied to the existing geometryto carry out the modeling intention of the user.For all networks, the input is the concatenation of three maps, allof spatial size 256 × S , with S ( x , y ) = ( x , y ) and S ( x , y ) = D , N encoding depth and normal, obtained by rendering theexisting geometry along the sketched viewpoint v . The viewingfrustum for generating the maps is twice the size of the sketchbounding box, to ensure the user input is well covered. For thedepth map, D ( x , y ) ∈ [ , ] is the normalized depth value for aforeground pixel ( x , y ) and D ( x , y ) = N ( x , y ) ∈ R is the normal vector that is firsttransformed into the 3D camera space and then shifted by ( , , ) for a foreground pixel, and N ( x , y ) = ( , , ) for background pixels. The classification network is a CNN with alternative layers of con-volution and pooling that finally outputs the probabilities for theoperator types that the input sketch represents; see supplementalmaterial for the detailed structure. The training loss is the weightedcross entropy: L cls ( S , D , N ) = − w O log ( P O ) , (1)where O is the ground truth operator type for the input trainingsample, the class weights (cid:0) w O ′ (cid:1) O ′ ∈ O are computed by normalizingthe inverse type frequency vector (cid:16) N O′ (cid:17) O ′ ∈ O , with N O ′ the numberof training samples of type O ′ , and P O is the predicted probability ofthe operator being of type O . We use weights for different operationtypes to avoid potential statistical bias caused by their contrastivefrequencies in the training set, as discussed in Sec. 6. Rather than directly regressing the parameters, we solve the regres-sion in two steps: first, we use deep neural networks to segment thesketch and context maps into regions corresponding to the definingstructures of the operators, and second, we fit operation parametersto the detected regions using counting, searching and optimization procedures. The benefits of such a two-step regression, are that thenetworks are only required to learn the single-modality segmenta-tion tasks, which is considerably more tractable than brute-forceregression of diverse operator parameters, and that the parameterfitting is robust to inaccuracies of network predictions. Further, thisdesign choice facilitates generalization across different operations.In contrast, by trying to regress directly the various parametersof an operation, we face several difficulties: to recover the extru-sion and offset distances from 2D images has the inherent scaleambiguity, the number of base polygon sides of the add/subtract operation is changing and needs complex network structures toaccommodate, and the regression of curved strokes requires fixedBezier or spline parameterization, while in our case we can choosesuitable representations to do the curve fitting.

The general network structure.

Each of the segmentation networksis a U-Net that outputs two maps through two decoder branches: theprobability map F of base face, and the curve segmentation map C ,both of spatial size 256 ×

256 and channel width 1; details of networkstructures are provided in the supplemental material. To train thenetwork, the loss function is in the following general form: L reд ( S , D , N ) = ∥ F − (cid:101) F ∥ + | (cid:101) M | ∥ (cid:101) M ⊙ ( C − (cid:101) C )∥ , where maps with ∼ are ground truth or precomputed maps, i.e., (cid:101) F is the ground truth base face map with (cid:101) F ( x , y ) = (cid:101) C is the ground truth stroke map, and (cid:101) M is the corresponding stroke pixel mask. ⊙ is the component-wiseproduct, and | · | sums the map pixel values.Given the predicted face map, we find the base face f by counting.To be specific, we first binarize the face map F by threshold 0.5, thenrender the face ID map of existing geometry Id ( x , y ) ∈ { f i } , andfinally find the face f = f ∗ with the highest accumulated probability,computed as f ∗ = arg max f i (cid:205) Id ( x , y ) = f i F ( x , y ) .Different operators have their specific curve segmentation maps C and (cid:101) C . The principle for designing the curve maps is that the input-output pair should be a learnable mapping without strong ambiguity .Next, we present the details of regression for each operator. Extrude regression.

We specify the ground truth extrusion curvesegmentation map in this way: (cid:101) C ( x , y ) = (cid:101) C ( x , y ) = C , we find themap of offset curve as C o ( x , y ) : = ( C ( x , y ) > . ) ∧ ( (cid:101) M ( x , y ) = ) ,and the map of profile curves as ( C ( x , y ) ≤ . ) ∧ ( (cid:101) M ( x , y ) = ) .Having classified the pixels, we find the extrusion distance d by line search. In particular, the edges of the base face, denotedas ∂ f , are extruded along normal direction n f for d to match C o .The linear search has a fine step size σ = . [− . , . ] , whereas the initial shapes have unit diagonal boundingbox length. Note that by including the negative search range, weallow pushing the base face inside the model as well. We define thematching distance of the extruded face edges and the offset curvemap, as dist ( d ) : = (cid:205) p ∈ ∂ f min C o ( q ) = ∥ π v ( p + d n f ) − q ∥ , where p samples face edges uniformly by arc length, q ranges over imagepixels, and π v : R → R is the projection function of the current ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:7 addsubtract add/subtract n f (a) (b) Fig. 5.

Handling ambiguity between add versus subtract.

The ambigu-ity of distinguishing base and offset curves for the add/subtract operator.(a) cases without ambiguity, as only one of the red and blue curves intersectswith the base face. (b) ambiguous case that can be add or subtract , withthe base curve being either red or blue. Instead of segmenting the base andoffset curves, we regress two curves along the face normal direction (redfirst, blue second), thus removing ambiguity. view. The line searched d with minimum dist ( d ) is the regressedextrusion distance. Bevel regression.

The ground truth curve segmentation map forthe bevel network encodes the two curves l and l ′ (see Fig. 4) inthis way: (cid:101) C ( x , y ) = l and (cid:101) C ( x , y ) = l ′ . Correspondingly, we find the predicted base face curve map as C l ( x , y ) : = ( C ( x , y ) > . ) ∧ ( (cid:101) M ( x , y ) = ) , and the map of l ′ as ( C ( x , y ) ≤ . ) ∧ ( (cid:101) M ( x , y ) = ) .Assuming the profile curve is drawn in one stroke, we find thestroke corresponding to l by counting. Let s ∗ be the stroke withthe highest accumulated probability: s ∗ : = arg max s i ∈ S (cid:205) p ∈ s i C l ( p ) ,where p uniformly samples s i in the screen space. We then fit acubic Bezier curve as l to match s ∗ as it is back projected onto theplane of f . Given f and l , we determine the corner c as the sharedvertex of the two edges of f which intersect with l . Once we have c ,the opposite corner c ′ is found easily. Add/subtract regression.

For the add/subtract operator, thereis ambiguity with outputting the base and offset curves directly,as is illustrated in Fig. 5. To remove this ambiguity, we insteadregress the two curves as ordered along the face normal direc-tion and named the start and end curves, respectively. The groundtruth curve map is given by (cid:101) C ( x , y ) = (cid:101) C ( x , y ) = (cid:101) C ( x , y ) = C s ( x , y ) : = ( C ( x , y ) ≤ . ) ∧ ( (cid:101) M ( x , y ) = ) , the profile curve C p ( x , y ) : = ( . < C ( x , y ) ≤ . ) ∧ ( (cid:101) M ( x , y ) = ) , and the end curve C e ( x , y ) : = ( C ( x , y ) > . ) ∧ ( (cid:101) M ( x , y ) = ) .The add/subtract operation has more complex parameters than extrude or bevel (see Fig. 4), the recovery of which also involvesmore steps: we first classify the strokes according to the predictedcurve map, then fit 2D curves to the strokes and determine theadd/subtract option, and finally back project the base curve to the3D base face and recover the prism length by line search.Again, we classify strokes by pixel counting. For a stroke s i , let itslikelihood of being starting curve as L s ( s i ) : = (cid:205) p ∈ s i C s ( p ) , where p samples s i uniformly, and similarly we have L p ( s i ) , L e ( s i ) for the likelihood of being profile and end curves, respectively; the curvetype of s i is the one with largest likelihood.We assume each of the profile curves is drawn by one stroke;therefore the number of profile strokes tells the N -gon of the prismbase. We then find the end points of the profile curves groupedinto the beginning set and the end set, which are used as the initialguess for fitting N -gons to the starting and ending strokes throughiterative closet point method, respectively.To determine the add/subtract option o , we check the intersec-tions of fitted polgyons with the base face f . If the start polygon P s intersects f , we have o = + the addition. Otherwise if the endpolygon P e intersects f , we have o = − the subtraction. However, ifnone of the two intersects with f , the sketch is regarded erroneouswith no matching operator instance. The user is alerted with thisfailure. For the ambiguous case shown in Fig. 5(b), the above proce-dure implies the default addition option, and the user can switch itmanually if needed (Sec. 7).Finally, the 2D base polygon, i.e., P s for addition and P e for sub-traction, is back projected onto the plane of f to obtain the 3D prismbase polygon, and the prism length d is obtained by line searchingthe 3D base polygon along normal direction to match the pixels ofthe offset end. The line search process replicates that for extrusion. Sweep regression.

Since the sweep operator is a special case ofthe add/subtract operator, the curve maps are the same for bothoperators. The fitting of parameters is much like add/subtract operator as well, with minor differences in curve fitting. Note thatwe restrict the sweep operations to circular cross sections.The start and end curves are fitted as ellipses to the correspondingcurve maps. After having determined the base curve and add/subtractoption, the ellipse is back projected to the base face plane as the basecircle c (see Fig. 4). The offset d , defined as the distance between thetwo circle centers, is again found by line search as done in extrusion,except to match the base circle center to the offset curve center inthis case. The offset circle c is then obtained by back projecting theoffset ellipse to the translated base face plane by distance d .To recover the profile curve, we first determine the 3D plane itlies on. For one of the 2D profile strokes, it has an intersection pointwith each of the two ellipses. The intersection points are lifted to 3Dfollowing the ellipse-circle back projection. The centers of c , c andany of the two intersection points together determine the 3D planethat the profile curve resides in. We fit a cubic Bezier curve insidethe plane to the profile stroke points, to obtain the profile curve.Finally, if we detect that the profile curve to be nearly linear andthat the two circles have similar radii, we rectify the swept shape tobe a cylinder. To train the networks for robust performance on real sketchinginteractions, we need a large-scale data set that covers the possiblevariations. Thanks to the procedural nature of CAD modeling, wecan generate the training data by synthesizing diverse procedures(Fig. 6). The training data generation therefore consists of two steps,the modeling sequence generation and the sketch image rendering,as discussed below.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020.

Protocol

AddPoly: AddPoly: AddPoly:

AddSweep:

RendererViewpoint sampler context & stroke maps classification regressionface & segment maps synthesized protocol set shape and sketch curves

Fig. 6.

Sketch2CAD at training time.

We synthetically generated 10k protocols of diverse lengths for procedurally generating 40k training shapes. For eachprotocol, we execute it up to the last operation, for which the sketch curves are built and overlaid on the built shape. The sketch curves and existing shape arerendered in proper viewpoints to generate the input sketch and local context maps, as well as the ground truth face and curve segmentation maps, which areused to train the operator classifier and the corresponding segmentation network.

Sequence generation.

Given a set of CAD modeling operations {O i } , we generate training data that allows the network to learnto infer from 2D sketches the corresponding operations robustly,while avoiding the prohibitive enumeration of the infinite space ofall possible 3D models and configurations of operations. Our keyobservation for achieving this goal is that while in theory one partof a 3D model can potentially be connected with every other part ofthe model, it is the local context that influences the part geometrythe most and therefore provides the dominant cue for interpretingits 2D sketch properly. Based on this observation, we only need toextensively enumerate the local combinations of different opera-tions producing diverse model variations to train the network. Thusin practice, for each operation O , in addition to its own parametricvariations, we search for a sequence {O , · · · , O m } of random oper-ations, that are applied before the operation, i.e., O ◦ O m ◦ · · · ◦ O ,to simulate the local context variations. Indeed, we find that with0 ≤ m ≤

3, there can be very complex combinations and shapesgenerated; some examples are shown in Fig. 7.To balance complexity, for each sequence length m + ∈ [ , , , ] ,we generate 10k protocols, thus 40k shapes in total. For sequences ofeach length, the last operator O has a fixed frequency for differenttypes, i.e., 1 : 1 : 4 : 2 for extrude , bevel , add/subtract and sweep ; the ratios are chosen to account for the different complexitiesof the four operators, allocating more samples for add/subtract and sweep which have more degrees of freedom. In addition, wegenerate 10k protocols with the same distribution for testing thenetworks. Note that since we weight the different operation typesby their inverse frequencies in the dataset (Eq. 1) for training the Fig. 7.

Procedurally generated training set.

Sample synthesized shapesand next step sketch by randomized combination of operations. Within onlyfour steps, very complex shapes can already be created. classification network, such a non-uniform distribution does notcause bias for operation recognition.While always starting with a base box shape, we randomize eachoperation instance in a synthesized sequence to cover sufficientgeometric variations while avoiding degeneracy. This includes, forexample, selecting a random planar face from the existing shape asthe base face, applying offsets sampled in a large range, generatingbase polygons or circles that have centers positioned randomlyinside the base face region, and polygons and profile curves that areperturbed without self-intersection.

Sketch rendering.

We design the rendering process to mimic howreal sketch drawing looks like. To render the corresponding sketchand context maps of an operator in the generated sequence, we ran-domly sample informative views around the base face of the operator,with view directions forming angles in the range of [ ◦ , ◦ ] withthe face normal. The viewing frustum is centered around the sketchcurves, and further scaled by a random factor in range [ . , . ] tocreate different zooming effects. We filter out the views where forthe extrusion operation, the offset curve is occluded for more than20%, or for the other operations, the base curve is occluded for morethan 20%, as such viewpoints are unnatural to take in real sketching.The 3D curves of an applied operator instance are first projectedonto the 2D camera space, then perturbed at the endpoints randomlyby a Gaussian noise, and finally smoothed a little for regularization,which reproduces the style of rough freeform sketching. Fig. 8.

Sketch2CAD UI.

A screenshot of our prototype implementation.The tool features freeform and interactive sketching over a 3D shape in thecentral canvas, the editing of recovered operation parameters on the left,and the illustration of the operation sequence on the right.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:9 (a) (b) (c) (d)

Fig. 9.

Stroke regularization and auto-completion in Sketch2CAD. (a)-(c): the corners of the sketched quadrangle are detected to be closeto right angles, and automatically regularized to form a rectangular basepolygon. (c)-(d): auto-completion by reflecting the primitive against thehorizontal cross section of the base shape.

We build a prototype modeling tool to demonstrate our approach.The tool features interactive modeling with an user interface thatallows sketching in 2D and instant feedback viewed freely in 3D.A screenshot of the our tool is shown in Fig. 8. Please refer to thesupplemental video for real time sketching and modeling sessions.Besides sketching, the tool allows the user to save, load, replayand edit the sequence of operations stored in protocol files, thusfully demonstrating the power of the procedural CAD modeling par-adigm. To assist the easy sketching of precise

CAD models, our sys-tem also implements techniques like the regularization of sketchedcurves, the tuning of operation parameters, and auto-completion byreplicating sketched primitives through symmetry, as detailed next.

Regularization.

In addition to the inherent regularization enabledby casting sketch into predefined operations in the procedural lan-guage level, our interface applies curve-level regularization, likesnapping and rectification, to assist user sketching, as is commonlyfound in CAD modeling software. The general idea of snappingis to detect key points, e.g., centers, edge middle points, cornersof the base face, and align the corners and centers of the sketchedprimitive shape with them, whenever the point pair comes within a(default) small distance. The general idea of rectification is to detectthe approximate parallelism between sketched edges and base faceedges, as well as the approximate equality of corner angles/edgelengths of sketched N -gons, and enforce the parallelism and equalityby constructing parallel edges and regular N -gons analytically.In particular, the snapping happens when the distance betweenthe nearest key point pairs is within 10% of the diameter of the baseface. The rectification happens when the differences of edge anglesfrom zero, or of corner angles from ( N − ) ◦ N , are within 20 ◦ , forparallelism and corner equalization, respectively. It also happenswhen the differences of side lengths are within 20% of the average (a) (b) (c) (d) Fig. 10.

Tuning the parameters of operations in Sketch2CAD . (a)-(b):a cylinder is sketched onto the base box, but only the cylinder is kept. (b)-(c):a swept shape is added. (c)-(d): the offset distance between the two circlesof the swept shape, as well as the top circle radius, are enlarged by tuningtheir parameter values.

Ref. P1, 12mins P2, 14mins P3, 10mins P4, 10mins P5, 6mins P6, 10minsRef. P1, 10mins P2, 8mins P3, 14minsP4, 10mins P5, 8mins P6, 8minsRef. P1, 20mins P2, 23mins P3, 15minsP4, 10mins P5, 10mins P6, 4mins

Fig. 11.

User gallery.

We asked 6 participants to reproduce 3 referenceshapes, shown in green. All participants completed these modeling tasks in5 to 20 minutes and achieved a close match to the reference. length for side length equalization. An example of regularization forrectangular prism addition is shown in Fig. 9. The user can switchoff the auto-regularization to sketch arbitrary shapes.

Tuning operation parameters.

As an advantage of inferring CADoperations the users can edit the recovered parameters of a sketchedoperation. We support three types of adjustments: creation of baseshapes, resolution of ambiguous results, and fine tuning for geo-metric precision. First, users can select Boolean operations betweenexisting shape and the sketched primitive, which is useful for quicklycreating a base shape different from the plain box that the systemstarts with. Second, users can switch between the union and dif-ference options for add/subtract and sweep , which have inherentambiguous cases that require user specification (Sec. 5). Third, userscan fine-tune the geometric parameters, e.g., offset distances, circleradius, etc. In particular for a swept shape, when tuning the dis-tance between circles or the radius of a circle, we adjust the controlpoints of the profile cubic Bezier curve in proportion, defined by thedistance from a control point to the fixed base circle or rotationalaxis, to preserve the overall shape of the swept geometry as much aspossible. An example of editing parameters of sequential operationsis shown in Fig. 10.

Auto-completion by symmetry.

Symmetry is prevalent in CADmodels and can greatly ease user interaction. In our tool, the usercan take advantage of symmetry by reflecting a sketched primitive

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ac bd efghijkl

Fig. 12.

Result gallery.

Various modeling sequences created during design sessions using Sketch2CAD. The corresponding protocol steps are shown in thesupplemental material. Please also refer to the supplementary video.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:11

Fig. 13.

Selected user modeling steps.

Users envision different paths andvariations of operations for reaching the similar targets. The freehand inac-curate sketches are robustly translated into intended operations. shape [Peng et al. 2018] and its Boolean operation through theselected cross section planes of the oriented bounding box of thebase shape, thus avoiding the need to repeat the sketch multipletimes manually. An example of auto-completion by symmetry isshown in Fig. 9. More examples are shown in Figs. 1 and 12.

With our tool, we have sketched several models of different com-plexities. Examples are shown in Figs. 1 and 12, with the operationsequences ranging from 2 to 11 steps, constructing CAD models fromthe simple bolts and nuts to the sophisticated mixer and cameras.User evaluation also confirms the ease of sketching CAD modelswith our approach (Sec. 8.1). We also validate the important designchoices in our framework through ablation tests in Sec. 8.2, anddiscuss limitations and future work in Sec. 8.4. Interactive model-ing sessions, complete user evaluation data, model mesh files andsample protocol files can be found in the supplemental material.

Runtime.

Tested on a desktop PC with Intel(R) Core i9-9900 3.1GHzCPU and NVidia RTX 2070 Super GPU, the network inference is in-stantaneous, taking around 0.07s. Most time is spent on line search,ranging from 0.01s to 2s, as the step involves repeated computa-tion of distances between pixels and stroke points, although it canbe largely parallelized. To apply the recovered operator, a Booleanmesh operation typically takes around 0.02s.

We have evaluated the ease of use of our system by asking 6 novices to create the same 3 reference shapes. The target shapes, pre-modeled Due to requirement for GPU at inference time and restriction on lab access, we couldnot test the system with a wider set of users. by an expert user, were presented to participants as a static image(Figure 11, green). Nevertheless, we do allow the participants to tryand explore with variations, so that novel and interesting deviationsfrom the references can be expected. All participants had little tono experience in sketching nor in CAD modeling, and were given atutorial and a short practice session to get familiar with our system.Figure 11 shows all models created by the participants, along withtheir time to completion. On the one hand, all participants managedto quickly produce models that closely match the reference shapes,demonstrating the ability of our system to make CAD modelingaccessible to non-professionals. On the other hand, several partic-ipants also decided to deviate from the reference, for instance byadding a second part to the lens of the camera (P2), or by modelinga curved handle for the hammer (P5). We see this unexpected be-havior as a consequence of the joy and artistic freedom offered byfreehand sketching.Figure 13 provides a selection of intermediate sketching steps per-formed by the participants, which shows that our system is capableof interpreting a wide variety of strokes representing similar shapes.Note that all participants used a mouse to draw the input strokes,which our system nevertheless translates into regularized CAD op-erations. Several participants commented that they appreciated theability of our system to produce regular shapes from approximatestrokes, and that they prefer to let the modeling flow going ratherthan revise what they had drawn. Although given the option, noneof the user turned off the stroke regularizer option in Sketch2CAD.Participants gave an average rating of 4.75 on a 5-point Likert scalewhen asked whether the sketches are properly translated to CADoperations, and an average rating of 5 for ease of conception ofthe modeling sequences. Complete user feedback and comments onthe ease of use of both the sequential modeling paradigm and ourprototype tool can be found in the supplemental material.

We validate two key components of our framework by ablation testsevaluated on the segmentation tasks for all operations:(1) Using local context versus using sketch only. The comparisonis shown in Table 1, where the ‘no context’ configurationuses only the sketch map as network input. It is clear that curve mapface mapno context primitive ours

Fig. 14.

Comparing the ablation configurations by example.

The realbevel sketch is shown on the left. The ‘no context’ network fails to produceany output that is above the map threshold (Sec. 5). The ‘primitive’ networkpredicts a good curve map, but cannot distinguish the two front facingpolygons for base face map, which leads to the wrong base face detected bycounting (Sec. 5). Our full network gives almost perfect base face and curvesegmentation maps.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020.

Table 1. Ablation tests on using context as input and using operator com-position to generate training data. Our full network using context as inputand trained on synthesized shapes composed by multiple operators has thebest accuracy for all segmentation tasks.

Operator Config Face IoU(%) Curve Acc. (%)

Extrude no context 59.31 98.20primitive 70.27 87.78ours

Bevel no context 3.23 50.24primitive 73.27 93.88ours

Add/Sub no context 10.27 66.21primitive 63.81 87.32ours

Sweep no context 7.34 48.90primitive 68.72 90.10ours without the local context maps of depth and normal of exist-ing shapes, the segmentation of both the base face and thesketch curves becomes very difficult, with the base face IoUfrequently under 10%. In real user sketching, the networkswithout context are barely usable (Fig. 14).(2) Using shapes composed by multiple operations for networktraining, versus using primitive shapes only. The configura-tion of ‘primitive’ shown in Table 1 trains the regression net-works on another set of 40k synthesized shapes and sketches,which however only contains sequences of length 1 (Sec. 6).The ‘primitive’ networks are then evaluated on the same 10ktesting dataset of different sequence lengths (Sec. 6) and com-pared with our results. It is clear that the primitive networksthat do not see sufficiently complex combinations of opera-tion cannot match the accuracy of our results, with base faceIoU lower for more than 10%. Real tests by user sketchingshow the difference as well (Fig. 14).In addition, we note that for the operator classification task, sincethe four operations have quite different sketch patterns, the twoablated configurations can achieve comparable performances as ourfull network does, i.e., 99 .

80% of no context, 93 .

68% of primitiveand 99 .

79% of ours, since depth and normal maps do not play theessential role in operation classification.

We test the robustness of network predictions under increasing lev-els of sketch irregularity. While it is difficult to collect large amountsof real user sketches with different levels of irregularity, we simulatethe variations by adding perturbations to clean sketches, as done forthe synthetic training data generation (Sec. 6). To be specific, we addstronger stroke perturbations than the training data generation con-figuration, i.e., 1 .

4% of the rendered image diagonal length for level1(ours), 2 .

8% for level 2 and 4 .

1% for level 3 (see Fig. 15), and evalu-ate how the pretrained model works under such out-of-distributionsettings. The statistics are reported in Table 2 and example sketchesare shown in Fig. 15. Quantitatively, as the noise increases, thesegmentation networks produce more inaccurate results, and thesame observation is found from the classification network (99 . Table 2. Quantitative robustness test of network predictions. Both the faceIoU and curve regression accuracy drop noticeably as the noise increase.

Operator Config Face IoU(%) Curve Acc. (%)

Extrude level 1(ours) level 2 89.59 96.28level 3 82.58 91.69

Bevel level 1(ours) level 2 85.90 94.89level 3 75.76 90.27

Add/Sub level 1(ours) level 2 77.09 92.02level 3 73.88 86.08

Sweep level 1(ours) level 2 77.31 94.44level 3 75.47 92.78of level 1 (ours), 85 .

06% of level 2, and 53 .

20% of level 3, respectively).Qualitatively, while the segmentation network was trained with alow level of noise, it produces high-quality segmentation maps formoderate noise (level 2). While high noise (level 3) degrades thesegmentation maps, the subsequent parameter fitting still yields areasonable shape.

In its current form, Sketch2CAD does not support drawing prim-itives on curved faces (e.g., on the curved face of a cylinder). Onepossibility would be to use NURBS as the modeling primitives, wherestitching face can be curved NURBS patches stopping at trim lines.This would, however, require an extension of the underlying geome-try engine used in our implementation. Another limitation involvesdrawing small features (e.g., knobs, or screw threads). While we dosupport zoom in our interface, having a library of small leaf-levelpart features can be useful to instantiate, rather than build up fromscratch. Finally, we expect a certain amount of sketching abilityfrom the user. Porting our code to a tablet interface can furtherlower this entry bar.Our training data generation only considers geometric feasibilityrather than semantics, e.g., not all combinations of the operations are level 1(ours)level 2level 3strokes base map profile map offset map face map results

Fig. 15.

Sketch perturbation examples and the corresponding net-work predictions and fitting results.

Each row shows an example ofa specific stroke perturbation level, while different columns show strokes,network outputs and the final result after the parameter fitting.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020. ketch2CAD: Sequential CAD Modeling by Sketching in Context • 164:13

Fig. 16.

CAD-based sketching.

The CAD models generated in our systemcan subsequently be procedurally edited – subdivided and smoothed in thisexample – and the resultant mesh be used to go back to a ‘sketch’ using NPRrendering. This allows the user to perform operations that are much easierin the CAD domain and then transition back to sketching, possibly withcamera view changes. This can be useful during ideation and prototypingphases of product design. functionally meaningful. On the other hand, in the scenario of CADmodeling, there are strong semantics about the desired forms andfunctions of the different parts and their composing operations forcommon man-made objects. In the future, we plan to take this factorinto consideration and train our networks on more realistic datathat respect real world model distributions, e.g., by utilizing datasetwith semantic annotations like PartNet [Mo et al. 2019]. It will alsobe interesting to train our network directly on CAD modeling tracedata, when available, to capture typical sequences of operations andlearn auto-complete routines (cf., [Peng et al. 2018]) directly fromuser data. Finally, while in this work we explored sketch-to-CAD,we can easily use the generated models, possibly after 3D basedediting and manipulations, to go back to the sketch domain andthus enable powerful edits to sketching. Figure 16 shows an earlyexample of such a possible workflow.

The visceral and approximate nature of freehand sketching is oftenconsidered to be in contradiction with the tediousness and rigidity of3D modeling. Yet, we observed that industrial design sketching andCAD modeling follow very similar workflows, where practitionerscreate complex shapes as a sequence of simple sketching (resp.modeling) operations. By identifying and parameterizing commonoperations in the two domains, and training a deep neural networkto recognize and segment these operations, we offer an interactivesystem capable of turning approximate sketches of human-madeobjects into regular CAD models, as illustrated by our evaluationwith novices as well as the diversity of shapes we created with ourapproach.

ACKNOWLEDGMENTS

The authors would like to thank the reviewers for their valuable anddetailed suggestions, the user evaluation participants and NathanCarr, Yuxiao Guo, Zhiming Cui for the valuable discussions. Thework of Niloy was supported by ERC Grant (SmartGeometry 335373),Google Faculty Award and gifts from Adobe, and the work of Adrienwas supported by ERC Starting Grant D3 (ERC-2016-STG 714221),research and software donations from Adobe. Finally, Changjian Liwants to thank, in particular, the endless and invaluable love andsupports from Huahua Guo over the tough time due to COVID-19.

REFERENCES

Autodesk. 2019a.

Maya

TinkerCAD

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Seok-Hyung Bae, Ravin Balakrishnan, and Karan Singh. 2008. ILoveSketch: as-natural-as-possible sketching system for creating 3d curve models. In

Proc. ACM UIST . ACM,151–160.Alexandra Bonnici, Alican Akman, Gabriel Calleja, Kenneth P Camilleri, Patrick Fehling,Alfredo Ferreira, Florian Hermuth, Johann Habakuk Israel, Tom Landwehr, JunchengLiu, et al. 2019. Sketch-based interaction and modeling: where do we stand?

AIEDAM (2019), 1–19.Frederic Cordier, Hyewon Seo, Mahmoud Melkemi, and Nickolas S. Sapidis. 2013.Inferring Mirror Symmetric 3D Shapes from Sketches.

Computer Aided Design

45, 2(Feb. 2013), 301–311.Chris De Paoli and Karan Singh. 2015. SecondSkin: Sketch-Based Construction ofLayered 3D Models.

ACM Transactions on Graphics (Proc. SIGGRAPH)

34, 4, Article126 (July 2015), 10 pages.Johanna Delanoy, Mathieu Aubry, Phillip Isola, Alexei A Efros, and Adrien Bousseau.2018. 3D Sketching using Multi-View Deep Volumetric Prediction.

Proceedings ofthe ACM on Computer Graphics and Interactive Techniques

1, 1 (2018), 21.Jonathan D. Denning, William B. Kerr, and Fabio Pellacini. 2011. MeshFlow: InteractiveVisualization of Mesh Construction Sequences.

ACM Trans. Graph.

30, 4, ArticleArticle 66 (July 2011), 8 pages.Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, DanielaRus, Armando Solar-Lezama, and Wojciech Matusik. 2018. InverseCSG: AutomaticConversion of 3D Models to CSG Trees.

ACM Transactions on Graphics (Proc.SIGGRAPH Asia

37, 6 (2018).Koos Eissen and Roselien Steur. 2008.

Sketching: Drawing Techniques for ProductDesigners . Bis Publishers.K. Eissen and R. Steur. 2011.

Sketching: The Basics . BIS.Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. 2018. Learn-ing to infer graphics programs from hand-drawn images. In

Advances in neuralinformation processing systems . 6059–6068.Jean-Dominique Favreau, Florent Lafarge, and Adrien Bousseau. 2015. Line DrawingInterpretation in a Multi-View Context. In

Proceedings of the Conference on ComputerVision and Pattern Recognition . IEEE.Yotam Gingold, Takeo Igarashi, and Denis Zorin. 2009. Structured Annotations for2D-to-3D Modeling.

ACM Transactions on Graphics (Proc. SIGGRAPH Asia)

28, 5(2009).Yulia Gryaditskaya, Mark Sypesteyn, Jan Willem Hoftijzer, Sylvia Pont, Frédo Durand,and Adrien Bousseau. 2019. OpenSketch: A Richly-Annotated Dataset of ProductDesign Sketches.

ACM Trans. Graph. (SIGGRAPH Asia)

38, 6 (November 2019).Haibin Huang, Evangelos Kalogerakis, Ersin Yumer, and Radomir Mech. 2016. ShapeSynthesis from Sketches via Procedural Models and Convolutional Networks.

IEEETransactions on Visualization and Computer Graphics (TVCG)

22, 10 (2016), 1.Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. 1999. Teddy: A SketchingInterface for 3D Freeform Design.

SIGGRAPH (1999).Joaquim A Jorge, Nelson F Silva, and Tiago D Cardoso. 2003. GIDeS++: A RapidPrototyping Tool for Mould Design. In

Proceedings of the Rapid Product DevelopmentEvent RDP .Manfred Lau, Greg Saul, Jun Mitani, and Takeo Igarashi. 2010. Modeling-in-context:user design of complementary objects with a single photo. In

Proc. Sketch-BasedInterfaces and Modeling .Changjian Li, Hao Pan, Yang Liu, Xin Tong, Alla Sheffer, and Wenping Wang. 2018.Robust flow-guided neural prediction for sketch-based freeform surface modeling.

ACM Transaction on Graphics (Proc. SIGGRAPH Asia) (2018), 238.Yuwei Li, Xi Luo, Youyi Zheng, Pengfei Xu, and Hongbo Fu. 2017. SweepCanvas:Sketch-Based 3D Prototyping on an RGB-D Image. In

Proc. ACM Symposium on UserInterface Software and Technology (UIST) (UIST âĂŹ17) .H Lipson and M Shpitalni. 1996. Optimization-based reconstruction of a 3D object froma single freehand line drawing.

Computer-Aided Design

28, 8 (1996), 651–663.Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, and RuiWang. 2017. 3D shape reconstruction from sketches via multi-view convolutionalnetworks. In

IEEE International Conference on 3D Vision (3DV) . 67–77.Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, andHao Su. 2019. PartNet: A Large-Scale Benchmark for Fine-Grained and HierarchicalPart-Level 3D Object Understanding. In

The IEEE Conference on Computer Visionand Pattern Recognition (CVPR) .Fernando Naya, Joaquim Jorge, Julián Conesa, Manuel Contero, and José María Gomis.2002. Direct modeling: from sketches to 3D models. In

Proceedings of the 1st Ibero-American Symposium in Computer Graphics SIACG . 109–117.Andrew Nealen, Takeo Igarashi, Olga Sorkine, and Marc Alexa. 2007. FiberMesh:designing freeform surfaces with 3D curves.

ACM Transactions on Graphics (Proc.

ACM Trans. Graph., Vol. 39, No. 6, Article 164. Publication date: December 2020.

SIGGRAPH)

26, Article 41 (2007). Issue 3.Gen Nishida, Ignacio Garcia-Dorado, Daniel G. Aliaga, Bedrich Benes, and AdrienBousseau. 2016. Interactive Sketching of Urban Procedural Models.

ACM Trans.Graph. (SIGGRAPH)

35, 4, Article 130 (July 2016), 11 pages.Patrick Paczkowski, Min H. Kim, Yann Morvan, Julie Dorsey, Holly Rushmeier, andCarol O’Sullivan. 2011. Insitu: Sketching Architectural Designs in Context.

ACMTransactions on Graphics

30, 6 (2011).Mengqi Peng, Jun Xing, and Li-Yi Wei. 2018. Autocomplete 3D sculpting.

ACMTransactions on Graphics (TOG)

37, 4 (2018), 1–15.A. Pipes. 2007.

Drawing for designers . Laurence King.Alec Rivers, Frédo Durand, and Takeo Igarashi. 2010. 3D Modeling with Silhouettes.

ACM Transactions on Graphics (Proc. SIGGRAPH)

29, 4, Article 109 (2010), 8 pages.Robert McNeel & Associates. 2019.

Rhinoceros

ACM Transactions on Graphics (Proc. SIGGRAPH Asia) , Vol. 28.ACM, 149.Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji.2018. CSGNet: Neural Shape Parser for Constructive Solid Geometry. In

IEEEConference on Computer Vision and Pattern Recognition (CVPR) .Alex Shtof, Alexander Agathos, Yotam Gingold, Ariel Shamir, and Daniel Cohen-Or.2013. Geosemantic Snapping for Sketch-Based Modeling.

Computer Graphics Forum

32, 2 (2013), 245–253. Wanchao Su, Dong Du, Xin Yang, Shizhe Zhou, and Hongbo Fu. 2018. InteractiveSketch-Based Normal Map Generation with Deep Neural Networks.

Proceedings ofthe ACM on Computer Graphics and Interactive Techniques

1, 1 (2018).The CGAL Project. 2020.

CGAL User and Reference Manual (5.0.2 ed.). CGAL EditorialBoard. https://doc.cgal.org/5.0.2/Manual/packages.htmlYonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B.Tenenbaum, and Jiajun Wu. 2019. Learning to Infer and Execute 3D Shape Programs.In

International Conference on Learning Representations .Trimble. 2019.

SketchUp

Proc. IEEE Conference on ComputerVision and Pattern Recognition (CVPR) .Baoxuan Xu, William Chang, Alla Sheffer, Adrien Bousseau, James McCrae, and KaranSingh. 2014. True2Form: 3D curve networks from 2D sketches via selective regular-ization.

ACM Transactions on Graphics (Proc. SIGGRAPH)

33, 4 (2014).Pengfei Xu, Hongbo Fu, Youyi Zheng, Karan Singh, Hui Huang, and Chiew-Lan Tai.2019. Model-Guided 3D Sketching.

IEEE Transactions on Visualization and ComputerGraphics

25, 10 (2019), 2927–2939.Robert C. Zeleznik, Kenneth P. Herndon, and John F. Hughes. 1996. SKETCH: AnInterface for Sketching 3D Scenes. In