ClipFlip : Multi-view Clipart Design
I-Chao Shen, Kuan-Hung Liu, Li-Wen Su, Yu-Ting Wu, Bing-Yu Chen
VVolume xx ( ), Number z, pp. 1–12
ClipFlip : Multi-view Clipart Design
I-Chao Shen Kuan-Hung Liu Li-Wen Su Yu-Ting Wu Bing-Yu Chen {jdily, lkh94, susan31213, kevincosner}@cmlab.csie.ntu.edu.tw, National Taiwan University, Taiwan [email protected], National Taiwan University, Taiwan (a) input clipart(b) visual scaffold (d) result without (c) drawing sequence with visual scaffold visual scaffold(e) result withvisual scaffold Figure 1: Given the input clipart (a), Our system automatically generates a visual scaffold (b) by rendering it from a style consistent 3Dshape generated by a user-assisted curve extrusion method. The user can design the clipart from the desired viewpoint by following the visualscaffold and draw each part step-by-step (c). We highlight the parts are being drawn at each step in red. Finally, the user can design betterclipart (e) from the desired viewpoint in a shorter period than designing it without the provided visual scaffold (d).
Abstract
We present an assistive system for clipart design by providing visual scaffolds from the unseen viewpoints. Inspired by the artists’creation process, our system constructs the visual scaffold by first synthesizing the reference 3D shape of the input clipart andrendering it from the desired viewpoint. The critical challenge of constructing this visual scaffold is to generate a reference3D shape that matches the user’s expectations in terms of object sizing and positioning while preserving the geometric styleof the input clipart. To address this challenge, we propose a user-assisted curve extrusion method to obtain the reference 3Dshape. We render the synthesized reference 3D shape with a consistent style into the visual scaffold. By following the generatedvisual scaffold, the users can efficiently design clipart with their desired viewpoints. The user study conducted by an intuitiveuser interface and our generated visual scaffold suggests that our system is especially useful for estimating the ratio and scalebetween object parts and can save on average 57% of drawing time.
CCS Concepts • Computing methodologies → Parametric curve and surface models;
Image processing; •
Human-centered computing → Graphical user interfaces;
1. Introduction
Vector clipart is widely used in graphic design for compactly ex-pressing concepts or illustrating objects in daily life. For example, designers communicating their design ideas with others, or presen-ters illustrating concepts with cliparts. Whenever we need an iconin our slides or videos, we usually try to search it in online clipartrepositories using our target keywords, such as a chair, a lamp, or submitted to COMPUTER GRAPHICS
Forum (10/2020). a r X i v : . [ c s . G R ] O c t I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design an airplane. A common situation we encounter is that even if weare lucky enough to find clipart with satisfactory shapes and col-ors, sometimes it is designed from an undesired viewpoint. In theend, we often compromise with either a satisfactory appearance ora viewpoint.When facing the above situation, one might consider manipulat-ing the existing clipart to the desired viewpoint via editing tools.However, it is challenging to draw other views of an object withan only one available view due to the individual differences oftheir mental rotation ability [VK78]. We conducted interviews withtwo professional artists, and they tackle this task by first creat-ing a rough 3D model based on the available view. The rough 3Dmodel can then be used as the mental prior for designing clipartfrom other viewpoints. Unfortunately, this procedure requires pro-fessional training in 3D modeling and a good mental prior of the3D object, which is almost impossible for general users.In this paper, we propose a multi-view clipart assistive system toshow unseen viewpoints of the clipart to support mental rotation.Our system follows the drawing-by-observation technique. Giventhe input clipart, our system’s goal is to automatically synthesizeother unseen viewpoints of the clipart object without changing thecharacter of the clipart. Users can efficiently design high-qualityclipart with their desired viewpoint by simply following the visualscaffold. Our system design simulates the creation process of pro-fessional artists. Instead of directly synthesizing unseen views fromthe 2D input, we first infer a reference 3D model based on the inputclipart’s viewpoint. We generate the reference 3D model with user-provided structural annotations that preserve the style of the inputclipart. The visual scaffold is then generated by rendering the refer-ence 3D model from the desired viewpoint. Finally, we also providean intuitive user interface to assist users in their clipart creation.The major contribution of our work is an algorithm for generat-ing a style-consistent 3D model from the single-view clipart. Eventhough there are existing works focusing on novel view synthe-sis [ZTS ∗
16, SHL ∗
18, PYY ∗
17, DSTB16, TDB16], there are tworeasons that prevent us from directly applying these methods toserve our purpose. First, these methods require substantial trainingdatasets to learn the categorical shape priors, but there is no existinghuge clipart dataset that can facilitate such a learning process. Sec-ond, directly infer the novel view or 3D representation of the cli-part object using the pre-trained models of the methods mentionedabove usually lead to failure cases (Figure 2). The reason is thatthe object boundaries in clipart are mainly composed of low-degreecurve types such as line and arc, not the cases in existing datasets.Due to the very different geometric styles, the predicted 3D geom-etry generated by previous methods would not be style-consistentwith the input clipart. Our proposed user-assisted curve extrusionmethod addresses this problem by combining state-of-the-art 3Dreconstruction and user-provided structural annotations to guide thecurve extrusion method. The generated 3D shape is regularized andstyle-consistent with the style of the input clipart. Noted that wedid not aim for proposing a novel deep learning architecture thathandles vector clipart directly. Instead, we leverage an existing 3Dreconstruction method using a raster image and combine it withstructural annotation to generate a regular 3D shape.We conducted a user study with general users who did not have Figure 2: Given the input clipat, both (b)Olszewski et al. [OTW ∗
19] (TBN) and (c) Chen et al. [CSH19]fail to synthesize novel view of the input clipart. We use thepre-trained model for both methods provided on their webpages.much drawing training or experience. The purpose is to evaluatehow helpful our generated visual scaffold is for designing clipartfrom an unseen viewpoint. To conduct this user study, we designeda complete system for assisting general users in multi-view clipartcreation. In our intuitive user interface, we show two display ar-eas side-by-side: the input reference area and the drawing area . Inthe input reference area, we show the input clipart to the user, asmost users will keep referring to the input viewpoint while draw-ing the novel viewpoint. In the drawing area, the user can use bothcurve tools and shape tools to design the novel view clipart. The vi-sual scaffold we synthesized lies under the drawing area. With oursystem, most of the users could design better clipart from a novelviewpoint with a regularized visual scaffold in a small fraction ofthe original time. Furthermore, most participants agree that our vi-sual scaffold significantly reduces their cognitive workload whendesigning clipart from a new viewpoint.
2. Related Work2.1. Novel-view synthesis
To synthesize photo-realistic novel view images, most traditionalmethods [PZ17, SCD ∗
06, KLS ∗
13, DTM96] take multi-view im-ages as input and infer the 3D representations explicitly to facili-tate novel view rendering. On the other hand, recent learning-basedmethods [ZTS ∗
16, SHL ∗
18, PYY ∗
17, DSTB16, TDB16] providethe ability to render novel-view images using only one single im-age. Some methods [DSTB16,TDB16] directly synthesize pixels ofthe desired viewpoint, while the other methods [ZTS ∗
16, PYY ∗ ∗
18] instead estimate the flow map from the input view tothe desired viewpoint. Some methods [KLA19, NPLT ∗
19] disen-tangled the image factors such as viewpoint, shape, and appear-ance from the source image and used these factors for facilitat-ing view synthesis. There are also learning-based methods focus-ing on generating better data representation for synthesizing novelviews from a single image, including voxel and volume-based rep-resentation [TZEM17, CXG ∗
16, NMOG19], point cloud [FSG17,MWYG20], mesh [GFK ∗ ∗
17, LPL ∗ submitted to COMPUTER GRAPHICS Forum (10/2020). -Chao Shen et al. / ClipFlip : Multi-view Clipart Design vector-based) that can facilitate such a learning-based approach.Meanwhile, Lopes et al. [LHES19a] proposed a generative modelfor font synthesis with a deep learning framework handles vectorformat. However, we can not apply this method directly for our ap-plication because their model did not handle (i) the color, (ii) shapecategory, and (iii) the viewpoint information. Hence, we combine alearning-based approach on raster image with user-provided struc-tural annotation and provide an intuitive interface to aid the creativeprocess. Vector clipart can be synthesized by vectorizing existing rasterimages or designing from scratch. Commercial products [Ado20,Vec20] provide robust image vectorization that simultaneouslytackles both image segmentation and curve (segment boundary)fitting problem. However, the clipart style usually differs fromthe natural image (e.g., flat shadings and rounded shapes), whichmakes the vectorized natural image not suitable for synthesizingclipart. Some previous works have focused on vectorizing pixelarts [Ste03,KL11,HDS ∗ et al. [HDS ∗
18] usedhuman perceptual cues to generate boundary vectorization that canbetter match viewers’ expectations. Liu et al. [LALR16] designedan interactive system to synthesize novel clipart by remixing exist-ing clipart in a large repository.
Content authoring is a fundamental problem in Computer Graph-ics. Many previous works focus on assisting the authoring work-flow. Among them, many works utilize the personal editing histo-ries to assist 2D sketch [XCW14], 3D shape sculpturing [PXW18],and viewpoint selection [CGW ∗ et al. [XKG ∗
16] utilizes the energy strokes from pro-fessional artists to facilitate the authoring process of 2D anima-tion. Lee et al. [LZC11] proposed a guidance system that dy-namically updates a shadow image underlying the user’s strokeswhile the user is drawing. Hennessey et al. [HLW ∗
17] automat-ically generates a step-by-step tutorial for drawing 3D models.Ryan et al. [SIJ ∗ Creating 3D shapes with different styles of geometric has drawn alot of attentions including Japanese manga style [SLHC12], Legobrick style [LYH ∗ ∗
09, YK12], and cu-bic style [lJ19]. Lun et al. [LKS15, LKWS16] analyze the stylesimilarities between shapes, and transfer different styles between shapes. Unlike the previous works that manipulate the input 3Dshapes using mesh deformation techniques, our goal is to synthe-size the 3D shape from the input clipart. We combine the single-view 3D mesh reconstruction method and user’s structural annota-tions to guide the curve extrusion process and synthesize reference3D shape that matches the input clipart in both geometric and ap-pearance styles.
3. Visual Scaffold Synthesis
Our system includes two major components: (i) visual scaffold syn-thesis, which synthesizes the input clipart under a desired view-point, and (ii) the drawing user interface, which displays the syn-thesized visual scaffold beneath the user’s drawing canvas to aid thedrawing process. Given the input clipart consists of multiple closedpaths S = { C , C , .. C n − } under input viewpoint θ i , the goal of thisstep is to synthesize the visual scaffold that aids the users in design-ing the clipart from an unseen viewpoint θ u efficiently. To achievethis goal, we design our method by addressing the following threeaspects:1. the shape in the visual scaffold image has to match the user’simagination for the input clipart from viewpoint θ u .2. the appearance style of the visual scaffold has to match the inputclipart.3. the geometric style of the shape in the visual scaffold has tomatch the input clipart.If the shape in the visual scaffold conflicts with the user’s imagina-tion (i.e., violate (1)), it will hinder the creative process instead ofproviding useful aids. In the meanwhile, if the geometric and ap-pearance style did not match the input clipart (i.e., violate (2) and(3)), the clipart created by the user will not be depicted as clipart. Toaddress the above observations, we propose a user-assisted curveextrusion method to generate a reference 3D shape from the inputclipart. The user is allowed to provide structural annotations on theinput clipart to indicate the 3D structure information. As illustratedin Figure 3, our method generates a reference 3D model M by fol-lowing the guidance of both user’s structural annotations and theguiding mesh M G , which is reconstructed by using a single-viewmesh reconstruction method based on the input clipart S . We searchfor optimal thickness and transformations of extrusion to simulta-neously match users’ structural annotations, while best interpretingthe guiding shape M G . Noted that we use the rasterized curves ofthe input clipart S for inferring the guiding mesh. The main reasonfor choosing the raster-based method is because unlike the maturearchitecture for raster image, such as CNN (convolutional neuralnetwork), there is no mature learning-based method that handlesthe vector clipart well. Moreover, this paper aims not to propose anovel deep learning architecture that directly handles vector clipart.Even though there exists relevant work [LHES19b] that handlesvector format data using a deep generative model, however, theirmodel did not consider (i) the color, (ii) shape category, and (iii)the viewpoint information. These pieces of information are vital toour application. So we choose to leverage more mature methodsproposed on raster image and focus on our application of aidingusers to design clipart under unseen viewpoint. submitted to COMPUTER GRAPHICS Forum (10/2020).
I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design (a) Guiding shape synthesis(b) User-provided structural annotation (c)
User-assi s ted curve extrusion top sidefront depth mapnormal mapforeground map Figure 3: Given the input clipart, our method (a) use a single-view 3D reconstruction method to synthesize a guiding shape. (b) The user canannotate the input clipart to provide the structural information. (c) Our method extrude the curves in the input clipart adaptively by leveragingthe predicted guiding shape and the user-provided structural annotations. (a) (b)
Figure 4: (a) Some closed paths (pointed by arrows) in the clipartrepresent shading instead of geometry. (b) After removing all theshading path, the remaining closed paths represent geometry only.
Performing 3D reconstruction from single-view input has been avery challenging task. In recent years, with the fast advances ofthe deep learning methods, the state-of-the-art method has shownthe possibility of learning a descriptive latent representation forcategory-specific objects and synthesizing the corresponding shaperepresented by voxels, point clouds, or meshes. In this work, wereconstruct the guiding 3D shape M G based on the input clipart S by following Lun et al. [LGK ∗ ∗ Shading path removal
One characteristic of the clipart is that itusually describes both appearance and geometry in the same file.Unlike the 3D model file, which usually describes the geometryonly, its appearances are defined through different material or tex-ture files. However, the path elements in the clipart sometimes rep-resent the shading, e.g., reflection (see Figure 4)), which is not di-rectly representing the shape. In this work, we focus on aiding theusers to design the geometry part of the clipart, so we categorizedpath elements in the clipart into two categories, i.e., geometry pathand shading path. Furthermore, we remove the shading path beforewe perform the 3D reconstruction. Currently, this process is per-formed manually. The main reason is that we do not have lightinginformation; thus, we can’t distinguish path type only based on itscurve geometry and color information.
Color removal
We remove the filled color of each closed path inthe input clipart. The remaining curve outlines are served as thesketches’ input to the method proposed in [LGK ∗ ∗
17] and obtained the predicted geometric data. Due tothe large differences of geometric styles between clipart and theshapes in their training dataset, the resulted point cloud constructedby the predicted geometric data is usually broken, as shown in Fig-ure 5(b).One possible solution is to re-train their model with a huge cli-part dataset; however, collecting large numbers of clipart with mul-tiple viewpoints is expensive and time-consuming. As a result, wechoose to use the pre-trained model provided by [LGK ∗
17] as ourinference model. Thus, the quality of predicted shapes is usuallyworse and noisier than the predicted results using the 3D shapesketches that capture the real-world shape style. To further filterout noisy predicted shape data, we use the input clipart to create anauxiliary foreground mask (Figure 5(c)) from the viewpoint of theinput clipart. As we observed, this step turns out pretty helpful for submitted to COMPUTER GRAPHICS
Forum (10/2020). -Chao Shen et al. / ClipFlip : Multi-view Clipart Design Figure 5: By replacing the predicted noisy foreground probabilitymask (b) with the known mask derived from the input clipart (c),we can greatly improve the quality of the reconstructed meshes. di di ziCi z-axis(a) (b) Figure 6: (a) For each curve C i (the red segment on the left), weproject vertices on the guiding shape M G and obtain a pointset P i that is enclosed by C i (inside the red rectangle). (b) Our extrusionparameterization.compensating for the bad quality due to the geometric style differ-ence. Please see Figure 5 for reconstructed shapes with and withoutthis auxiliary foreground mask. After obtaining the guiding shape M G , we want to leverage it toguide the curve extrusion process to make a 2D curve into a 3D vol-ume. Curve extrusion is widely used in designing the CAD modelwhere the designers usually design layout the shape in 2D and ex-trude certain thickness along the direction into solid volumes. Ourgoal is to extrude all the closed paths in the input clipart S into vol-umes and transform them to cover as many vertices in the guidingshape M G as possible. We assume the extrusion axis is along thez-axis (i.e., the input clipart lying on the xy-plane). And we param-eterize the extruded volume V i of a closed path C i using the follow-ing parameter: (i) the extrude thickness d i , and (ii) the z-coordinate z i of the centroid of the extruded volume (see Figure 6(b) for illus-tration). User-assisted structural annotation
We allow the users to an-notate each closed path in the input clipart with some structuralproperties. These annotations enable the users to assign the 3D in-
A B(a) (b) (c) (d)
Figure 7: The user can annotate four different types of structuralinformation. (a) The paths of chair legs (in red) are annotated as multiple objects . (b) Four paths (in green) are annotated as samethickness . (c) Four paths (in purple) are annotated as same depth .(d) Path A is annotated as it is in front of path B (blue).formation they depicted. We provide the following four types ofannotations: • multiple objects : for each closed path C i in 2D, there mightbe multiple objects in 3D which are occluded. We provide thisannotation to the user to assign the number of objects N theythink exists. We duplicate the annotated closed path N timesbefore doing the extrusion. • same thickness : for each pair of closed paths C i and C j , the usercan enforce them to have the same extrude thickness (i.e., d i = d j ). • same depth : for each pair of closed paths C i and C j , the usercan enforce the z-coordinate of the centroid ( z i and z j ) of theextruded volume to be the same. • depth order : by default, we will leverage the input clipart’s lay-ering as the depth orders between closed paths. However, theuser can assign the desired depth order to overwrite the defaultordering.Please see Figure 7 for illustrations of each structural annotations.For each closed path C i , we project the vertices of the guidingshape M G back onto the xy-plane and obtain a pointset P i that isenclosed by C i (see Figure 6(a)). For the closed path annotated asmultiple objects, we perform a clustering operation on the enclosedpointset P i based on the 3D points’ positions, e.g., if a closed pathis duplicated twice, we generate two pointsets P i and P i . And weassign each pointset to each duplicated closed path. And we ob-tain the best extruded volumes V = { V , ..., V n } by optimizing thefollowing geometry approximation cost function:minimize E cover + ω E thickness (1)subject to z i − z j = same depth (2) z i − z j > depth order (3) d i − d j = same thickness (4)We define the volume coverage cost ( E cover ) as: dist ( x , V i ) = (cid:40) , if x is inside V i min q ∈ Ω ( V i ) (cid:107) x − q (cid:107) , otherwise , (5) E coveri = ∑ x ∈ P i dist ( x , V i ) (6) E cover = n ∑ i = E coveri (7) submitted to COMPUTER GRAPHICS Forum (10/2020).
I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design where x is a point belongs to P i , n is the number of closed pathsin input clipart S , and Ω ( V i ) represents the surface of the extrudedvolume. And we define the thickness cost ( E thickness ) as : E thickness = n ∑ i = (cid:107) d i (cid:107) . (8)The intuition of optimizing Eq. 1 is to encourage the volume tocover as many enclosed points as possible and keep the extrudethickness small. While minimizing the cost function (Eq. 1), thestructural annotation is formulated into constraints (Eq. 2 to 4) thatenforce the structural relationship between closed paths in the inputclipart S . The ω we used in Eq. 4 is set as 100 by doing a gridsearch over a set of possible values on two verification clipart. Wefound out the value better balance the scale between both terms anddiscourage the over-thickness extrusion results. Optimization method
To simplify the optimization problem, wepropose to use a greedy method to obtain the optimized result.To optimize Eq. 1 for each closed path C i , we first fit an orientedbounding box (OBB) O i for the enclosed pointset P i . For each d i ,we limit the possible values as three side lengths of O i . We initial-ize d i as one of the side lengths with the minimum E cover value.And we initialize z i as the centroid of P i .Next, we resolve the structural constraints (from Eq. 2 to 4) one byone. For same depth constraint, for closed paths C i and C j that areannotated as same depth, we move one of their extruded volume(i.e., V i or V j ) that leads to the smaller E cover . For depth order constraint, we also choose one of the extruded volumes between V i and V j and move the z coordinate of the chosen volume (either z i or z j ) that leads to smaller E cover . Finally, for same thickness con-straint, we also choose one of the extruded volumes between V i and V j and set their thickness value of the chosen volume (either z i or z j ) that leads to smaller E cover . In Figure 8, we show an illustrativeexample of the optimization process. The main reason for utilizingthe greedy method instead of the traditional gradient-based opti-mization method is that the two sets of variables (i.e., the thickness d i and the centroid of extruded volume z i ) related to each other.Specifically, the different centroid of extruded volume leads todifferent optimal thickness and vice versa. Although this kind ofproblem is studied and solvers are proposed for different applica-tions [LZX ∗
08, BDS ∗
4. User Interface
Figure 9(a) illustrate our user interface, which consists of two sep-arate areas. The reference area shows the input clipart, which actsas the reference viewpoint for designing clipart from the desiredviewpoint. The canvas area is the place that the user designs theclipart using curve and shape tools. In the canvas area, we over-lay the visual scaffold under the canvas, and the user can decide ifhe/she wants to follow the scaffold or not.In the canvas area, we provide two sets of drawing tools,i.e., curve tools and shape tools. In the curve tools, we provide(i) line, (ii) arc, and (iii) freeform tools. The reason for choosingline and arc is because we observed that many clipart could be de-scribed solely using lines and arcs. In the shape tools, we provide Figure 8: The result after (a) initialization of the thickness ( d ) andthe depth value ( z ), (b) resolving depth value, and (c) resolvingthickness. (a) (b) Figure 9: (a) Screenshot of our user interface. On the reference area(left), we show the input clipart. On the canvas area (right), the vi-sual scaffold is overlaid under the canvas. The user can use curvetools and shape tools to design clipart from the desired viewpoint.(b) In the current design session, the user used different curvesand shapes (represented in different colors), including line, arc,bezier freeform tools, rectangle, ellipse/circle, and rounded rect-angle. Please see the attached video for the detail of our user inter-face.(i) rectangle, (ii) ellipse/circle, and (iii) rounded rectangle. Theseshape primitives are chosen because they are often used to designclipart from canonical viewpoints. Please see Figure 9(b) for theillustration of different tools.In the canvas area, the user can first sketch out the outline ofthe shape he/she wants to draw. The sketch can be hidden duringthe designing process. The user can also draw separate curves andprimitives in different layers and move layers forward and back-ward if he/she wants to.
5. Results and Evaluation
We used our user-assisted curve extrusion method to generate sev-eral man-made 3D shapes in different clipart. For each input clipart,we compared the 3D shape was generated by our method with thefollowing methods:
Single-view sketch shape reconstruction
The guiding shape weused in our method was generated by [LGK ∗ Sketch-based shape retrieval
We perform a sketch-based shaperetrieval using input clipart. The goal is to retrieve the mostsimilar shape in ShapeNet [CFG ∗
15] dataset. For each shape in submitted to COMPUTER GRAPHICS
Forum (10/2020). -Chao Shen et al. / ClipFlip : Multi-view Clipart Design the ShapeNet dataset, we extracted their outlines using Sugges-tive Contours [DFRS03] as candidate images. And we extract thestrokes of each input clipart as query and compute the similaritiesbetween the clipart strokes and contour of each shape using ShapeContext descriptor [BMP01]. Artist creation
We asked an artist to design the clipart from un-seen viewpoints given the input clipart, and he generates a reference3D shape as his reference for each clipart.We assigned colors for both results using single-view sketch shapereconstruction and shape retrieval from the input clipart. We per-form shape registration between the rendered contour image to theinput clipart to establish the correspondences. And we project thecolor obtained by correspondences onto the retrieved 3D shape.For propagating the colors onto all triangle T of the retrieved 3Dmesh, we use a formulation of a Markov random field (MRF) prob-lem as follow: E ( f ) = w data ∑ t ∈ T D ( t , f t ) + w smoothness ∑ t , s ∈N S t , s ( t , s , f t , f s ) , (9)where f is the function assigns the color label to each triangle t ∈ T , f t and f s are color labels assigned to triangle t and s , and N is theset of all pairs of triangles sharing edges. And we define the dataterm (i.e., D ) as follow: D ( t , f t ) = (cid:40) (cid:107) c ( t ) − ¯ c ( t ) (cid:107) , if color ¯ c ( t ) is assigned to triangle t , otherwise (10)And the smoothness term (i.e., S t , s ) is used to measure the spatialconsistency of neighboring triangles and is defined as follow: S ( t , s , f t , f s ) = (cid:40) , if f t = f s − log ( θ t , s / π ) ψ t , s , otherwise , (11)where θ t , s and ψ t , s are the dihedral angle and the centroid distancebetween t and s , respectively. In all the results shown in the paper,we set w data = . w smoothness =
10. And we solved Eq. 9 usingthe graph cut algorithm [BK04]. Noted that the assigned colors aremerely for convenient comparison.We show all the results of compared methods in Figure 11 andFigure 12. Noted that we only show 3D shapes from upper 45 ◦ in the main paper, and we show the rendered image from canon-ical views in the supplemental material. For all shapes, the geo-metric and appearance styles of our reconstructed results are moreconsistent with the input clipart compared to the results generatedby [LGK ∗
17] and the retrieved results. And compared to the artist’sresults, it took him on average 1 hour to create the 3D shapes forchair cases and 1.5 hours for airplane cases. Our method automati-cally produces qualitatively similar results in a fraction of this time(for chair cases, 4 mins including user annotation and extrusion op-timization, and 6.5 mins for airplane cases on average).
We have conducted a preliminary user study to evaluate how thevisual scaffold we synthesized can help users design clipart underunseen viewpoints. In this user study, we did not aim for evaluatingthe method of synthesizing the visual scaffold, i.e., the participants only draw the clipart under an unseen viewpoint, not the curve ex-trusion process.
Setup
All tasks were conducted on a 12-inch Surface Pro laptopwith a Surface stylus. The study contains three sessions: tutorial,drawing, and interview. The entire study took about two hours perparticipant.
Participants
We recruited 12 participants (seven males and fivefemales). All participants are non-professional artists, and only 3of them have experience in physical-painting or digital-painting.We ask each participant to draw one of the six clipart (three chairsand three airplanes).
Tutorial session
We designed the tutorial session to help the par-ticipant familiarize with the drawing interface we provided and dif-ferent functions in our drawing interface. There are four stages ofthe tutorial session that aim to let the user familiarize with func-tions, including (i) drawing single curves, (ii) connecting curves,(iii) filling color, (iv) how to draw primitives, and (v) the layeringconcept. For each function, we prepared target results (e.g., rectan-gles with different sizes or specific layering) and asked them to usethe tools to match the targets. In total, we allow the participants topractice in the tutorial session for at most 15 minutes.
Drawing session
In the drawing session, we show the input cli-part to the participants and ask them to design clipart of the sameobject but from three other viewpoints. For chair clipart, the inputviewpoint is from the front viewpoint, and then we ask the user todraw from the top, and the side, and the upper 45 ◦ viewpoints. Forairplane clipart, the input viewpoint is from the top viewpoint, sowe ask the user to draw the front, and the side, and the upper 45 ◦ viewpoints. Each participant design the clipart with and without ourvisual scaffold in a random order to avoid bias. Interview
In the end, we collect feedback from each participanton different aspects of the assistive drawing interface. In the ques-tionnaire, we asked participants about (i) their drawing experience,(ii) their thoughts on difficulties of drawing different viewpoints,and (iii) how they think the visual scaffold aids them during thedrawing session. In the last part, we asked the following questions: • do they think the provided visual scaffold is helpful? • how the provided visual scaffold affect them while drawing? • other suggestions or comments on how they think the providedvisual scaffold aid can be improved.Please find the full version of our questionnaire in Section 3 of thesupplemental material. Drawing result
We show several clipart designed by the partici-pants in Figure 13. As we can observe in Figure 13, the participantscan capture ratios between parts or draw different parts’ positionsmore accurately. Please see the complete results from all partici-pants in the supplemental material.
We asked the participants about their thoughtson difficulties of drawing different viewpoints. All of the partici-pants agree that the difficulties vary across viewpoints. Among allparticipants, eleven out of twelve consider the upper 45 ◦ viewpointmore challenging to draw than the rest viewpoints. More specifi-cally, for airplane clipart, 83 .
3% of participants thought the upper45 ◦ viewpoint is the hardest to draw, and 16 .
7% of them thought submitted to COMPUTER GRAPHICS
Forum (10/2020).
I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design front side upper-450102030 D r a w i ng t i m e ( m i n s ) Airplane without visual scaffold with visual scaffold top side upper-450102030 D r a w i ng t i m e ( m i n s ) Chair
Figure 10: We show the drawing time statistics of the user study. Onaverage, the participants can save 15% of drawing time for chaircases and 40% for airplane cases. The provided visual scaffoldis most helpful in drawing the airplane from the front viewpoint(saved 57% of drawing time). And it is most helpful for drawingboth the chair and the airplane from the upper 45 ◦ viewpoint.the side viewpoint is the hardest one. In terms of chair clipart, allparticipants agree that the upper 45 ◦ viewpoint is the hardest todraw. Scaffold aids survey
Eleven of twelve participants (91 . • two participants thought the visual scaffold helps them see theoverview at the beginning of the design process. It gives thema better idea of how to draw the different layers composing thetarget object. • five participants thought the visual scaffold helps them estimatethe ratio and scale between different parts of the target object. • two participants thought the visual scaffold reminds them somedetails they ignore • two participants thought the visual scaffold helps them to ar-range the layering while designing the target object.According to the above user feedback, we can observe that our vi-sual scaffold helps design clipart under an unseen viewpoint, espe-cially useful for estimating the ratio and scale between parts. Mean-while, two participants thought the visual scaffold might interferewith their imagination, thus hindering the design process. The po-tential reason is that the visual scaffold is sometimes inaccurate,which might conflict with the participants’ imagination of the tar-get object.Overall, most of the participants can design the clipart from thetarget viewpoint in shorter periods. As shown in Figure 10, the par-ticipants saved time on drawing upper 45 ◦ for both chair and air-plane clipart. With the provided visual scaffold, the participants cansave 57% of time when drawing complicated parts such as the air-plane from the front viewpoint. Meanwhile, we observed that theparticipants spent more time drawing the chair from the side view-point. From the user feedback, we found out that this is becausethe visual scaffold provides additional details from the side view-point compared to their imagination. Even though they spent moretime drawing with visual scaffold from the side viewpoint, they candraw more details, leading to better results.
6. Conclusion
In this paper, we propose an assistive system that aids the usersin designing clipart from unseen viewpoints. The core of our sys-tem is a user-assisted curve extrusion method that combines theuser-provided structural annotation and a guiding 3D mesh that issynthesized by the single-view shape reconstruction method. Werender the generated 3D shape using the same style of input clipartinto the visual scaffold. We conducted a user study with 12 users.With our visual scaffold, we found out that the users can designbetter quality clipart from unseen viewpoints using a shorter timethan designing without the provided visual scaffold.There are two parts of limitation in our proposed system, i.e., the3D reconstruction part, and the user-assisted design part. The lim-itations of our 3D reconstruction method are twofold. The firstpart is because we choose to use a pre-trained model of an ex-isting learning-based 3D reconstruction method [LGK ∗
17] to re-construct the guiding mesh, the quality of predicted shapes is usu-ally not satisfying. Besides the reason mentioned in Section 3.1,another reason is that many clipart shapes are quite distinct fromthe training data. For example, we can not find similar shapesin the training data for the two airplanes shown in Figure 14;thus, the reconstructed guiding meshes are pretty noisy and bro-ken. Our current method is not able to recover the missing partsin the guiding mesh. The second one is that our extrusionmethod can not model curvature along the extrusion direction.Since the current extrusion method only extrudes the geometrywith a single thickness value. We plan to use different geome-try representation to address this issue, such as voxels, bevels, orgeometry profiles used in [KFWM17]. In terms of the limitationof the user-assisted design part, we observed that designs clipartfrom a non-canonical viewpoint (e.g., the upper 45 ◦ viewpoint) ismuch harder than a canonical viewpoint. However, our current sys-tem does not provide step-by-step assistance for users. In the fu-ture, we would love to explore how to design more intuitive tools,e.g., adding block-in guides, or other novel assistance to better aidsthe users in designing clipart from non-canonical viewpoints.
7. Acknowledgement
This work was supported in part by the Ministry of Science andTechnology, Taiwan, under Grant MOST 109-2634-F-002-032.And we are grateful to the National Center for High-performanceComputing. We want to thank Sheng-Jie Luo, Chi-Lan Yang,anonymous reviewers for insightful suggestions, and SeraphinaYong for proofreading parts of the paper. I-Chao Shen was sup-ported by the MediaTek Fellowship.
References [Ado20] A
DOBE : Adobe illustrator 2020: Image trace, 2020. URL: . 3[BDS ∗
12] B
OUAZIZ
S., D
EUSS
M., S
CHWARTZBURG
Y., W
EISE
T.,P
AULY
M.: Shape-up: Shaping discrete geometry with projections.In
Computer Graphics Forum (2012), vol. 31, Wiley Online Library,pp. 1657–1667. 6[BK04] B
OYKOV
Y., K
OLMOGOROV
V.: An experimental comparisonof min-cut/max-flow algorithms for energy minimization in vision.
IEEEtransactions on pattern analysis and machine intelligence 26 , 9 (2004),1124–1137. 7 submitted to COMPUTER GRAPHICS
Forum (10/2020). -Chao Shen et al. / ClipFlip : Multi-view Clipart Design (a) input clipart (b) [Lun et al. 2017] (c) retrieved result (d) our result (e) artist's result Figure 11: We compared the reconstructed 3D chair shapes using our method with the results generated by [LGK ∗ [BMP01] B ELONGIE
S., M
ALIK
J., P
UZICHA
J.: Shape context: A newdescriptor for shape matching and object recognition. In
Advances inneural information processing systems (2001), pp. 831–837. 7[CFG ∗
15] C
HANG
A. X., F
UNKHOUSER
T., G
UIBAS
L., H
ANRAHAN
P., H
UANG
Q., L I Z., S
AVARESE
S., S
AVVA
M., S
ONG
S., S U H.,
ET AL .: Shapenet: An information-rich 3d model repository. arXivpreprint arXiv:1512.03012 (2015). 6[CGW ∗
14] C
HEN
H.-T., G
ROSSMAN
T., W EI L.-Y., S
CHMIDT
R. M.,H
ARTMANN
B., F
ITZMAURICE
G., A
GRAWALA
M.: History assistedview authoring for 3d models. In
Proceedings of the SIGCHI Confer-ence on Human Factors in Computing Systems (New York, NY, USA,2014), CHI ’14, ACM, pp. 2027–2036. URL: http://doi.acm.org/10.1145/2556288.2557009 , doi:10.1145/2556288.2557009 . 3[CSH19] C HEN
X., S
ONG
J., H
ILLIGES
O.: Monocular neural image based rendering with continuous view control. In
Proceedings of theIEEE International Conference on Computer Vision (2019), pp. 4090–4100. 2[CXG ∗
16] C
HOY
C. B., X U D., G
WAK
J., C
HEN
K., S
AVARESE
S.:3d-r2n2: A unified approach for single and multi-view 3d object recon-struction. In
European conference on computer vision (2016), Springer,pp. 628–644. 2[DFRS03] D E C ARLO
D., F
INKELSTEIN
A., R
USINKIEWICZ
S., S AN - TELLA
A.: Suggestive contours for conveying shape.
ACM Transactionson Graphics (Proc. SIGGRAPH) 22 , 3 (July 2003), 848–855. 7[DN19] D AI A., N
IESSNER
M.: Scan2mesh: From unstructured rangescans to 3d meshes. In
The IEEE Conference on Computer Vision andPattern Recognition (CVPR) (June 2019). 2[DSTB16] D
OSOVITSKIY
A., S
PRINGENBERG
J. T., T
ATARCHENKO submitted to COMPUTER GRAPHICS
Forum (10/2020). I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design (a) input clipart (b) [Lun et al. 2017] (c) retrieved result (d) our result (e) artist's result
Figure 12: We compared the reconstructed 3D airplane shapes using our method with the results generated by [LGK ∗ M., B
ROX
T.: Learning to generate chairs, tables and cars with convo-lutional networks.
IEEE transactions on pattern analysis and machineintelligence 39 , 4 (2016), 692–705. 2[DTM96] D
EBEVEC
P. E., T
AYLOR
C. J., M
ALIK
J.: Modeling andrendering architecture from photographs: A hybrid geometry-and image-based approach. In
Proceedings of the 23rd annual conference on Com-puter graphics and interactive techniques (1996), pp. 11–20. 2[FSG17] F AN H., S U H., G
UIBAS
L. J.: A point set generation net-work for 3d object reconstruction from a single image. In
Proceedings ofthe IEEE conference on computer vision and pattern recognition (2017),pp. 605–613. 2[GFK ∗
18] G
ROUEIX
T., F
ISHER
M., K IM V. G., R
USSELL
B., A
UBRY
M.: AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Gen-eration. In
Proceedings IEEE Conf. on Computer Vision and PatternRecognition (CVPR) (2018). 2[HDS ∗
18] H
OSHYARI
S., D
OMINICI
E. A., S
HEFFER
A., C
ARR
N.,W
ANG
Z., C
EYLAN
D., S
HEN
I.,
ET AL .: Perception-driven semi-structured boundary vectorization.
ACM Transactions on Graphics(TOG) 37 , 4 (2018), 118. 3[HLW ∗
17] H
ENNESSEY
J. W., L IU H., W
INNEMÖLLER
H., D
ONTCHEVA
M., M
ITRA
N. J.: How2sketch: Generating easy-to-follow tutorials for sketching 3d objects.
Symposium on Interactive3D Graphics and Games (2017). 3[KBH06] K
AZHDAN
M., B
OLITHO
M., H
OPPE
H.: Poisson surface re-construction. In
Proceedings of the Fourth Eurographics Symposium onGeometry Processing (Goslar, DEU, 2006), SGP ’06, Eurographics As-sociation, p. 61–70. 4[KFWM17] K
ELLY
T., F
EMIANI
J., W
ONKA
P., M
ITRA
N. J.: Big-sur: large-scale structured urban reconstruction.
ACM Transactions onGraphics 36 , 6 (2017). 8[KL11] K
OPF
J., L
ISCHINSKI
D.: Depixelizing pixel art.
ACM Transac-tions on graphics (TOG) 30 , 4 (2011), 99. 3[KLA19] K
ARRAS
T., L
AINE
S., A
ILA
T.: A style-based generator ar-chitecture for generative adversarial networks. In
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (2019),pp. 4401–4410. 2[KLS ∗
13] K
OPF
J., L
ANGGUTH
F., S
CHARSTEIN
D., S
ZELISKI
R.,G
OESELE
M.: Image-based rendering in the gradient domain.
ACMTransactions on Graphics (Proceedings of SIGGRAPH Asia 2013) 32 , 6(2013). 2 submitted to COMPUTER GRAPHICS
Forum (10/2020). -Chao Shen et al. / ClipFlip : Multi-view Clipart Design (a) without visual scaffold (b) with visual scaffold (d) artist's result(c) visual scaffold Figure 13: We compare clipart from unseen viewpoints designed byparticipants (a) with and (a) without our visual scaffold. The visualscaffold is shown in (c). With the aids of the visual scaffold, theparticipants can (i) capture the ratios between different parts of thetarget object from unseen viewpoints, and (ii) place different partsin better positions with the aids of our visual scaffolds.Figure 14: We show two examples of problematic guiding shapesreconstructed using a learning-based 3D reconstruction method.Given the input clipart on the left in both (a) and (b), we canobserve the broken reconstruction results highlighted in red rect-angles. The reconstruction method fails to reconstruct some thinstructures (e.g., propeller in (a)) or add some unnecessary struc-tures (e.g., the vertical stabilizer in (b)). [LALR16] L IU Y., A
GARWALA
A., L U J., R
USINKIEWICZ
S.: Data-driven iconification. In
International Symposium on Non-PhotorealisticAnimation and Rendering (NPAR) (May 2016). 3[LGK ∗
17] L UN Z., G
ADELHA
M., K
ALOGERAKIS
E., M
AJI
S., W
ANG
R.: 3d shape reconstruction from sketches via multi-view convolutionalnetworks. In (2017),IEEE, pp. 67–77. 2, 4, 6, 7, 8, 9, 10[LHES19a] L
OPES
R. G., H A D., E CK D., S
HLENS
J.: A learned rep-resentation for scalable vector graphics. In
The IEEE International Con-ference on Computer Vision (ICCV) (October 2019). 3[LHES19b] L
OPES
R. G., H A D., E CK D., S
HLENS
J.: A learned rep-resentation for scalable vector graphics. In
Proceedings of the IEEEInternational Conference on Computer Vision (2019), pp. 7930–7939. 3[lJ19]
LIU
H.-T. D., J
ACOBSON
A.: Cubic stylization.
ACM Transac-tions on Graphics (TOG) 38 , 6 (2019), 1–10. 3[LKS15] L UN Z., K
ALOGERAKIS
E., S
HEFFER
A.: Elements of style:Learning perceptual shape style similarity.
ACM Transactions on Graph-ics 34 , 4 (2015). 3[LKWS16] L UN Z., K
ALOGERAKIS
E., W
ANG
R., S
HEFFER
A.: Func-tionality preserving shape style transfer.
ACM Transactions on Graphics35 , 6 (2016). 3[LLCL19] L IU S., L I T., C
HEN
W., L I H.: Soft rasterizer: A differen-tiable renderer for image-based 3d reasoning. In
Proceedings of the IEEEInternational Conference on Computer Vision (2019), pp. 7708–7717. 2[LPL ∗
18] L I C., P AN H., L IU Y., T
ONG
X., S
HEFFER
A., W
ANG
W.: Robust flow-guided neural prediction for sketch-based freeform surfacemodeling.
ACM Transactions on Graphics (TOG) 37 , 6 (2018), 1–12. 2[LYH ∗
15] L UO S.-J., Y UE Y., H
UANG
C.-K., C
HUNG
Y.-H., I
MAI
S.,N
ISHITA
T., C
HEN
B.-Y.: Legolization: Optimizing lego designs.
ACMTransactions on Graphics (Proc. SIGGRAPH Asia 2015) 34 , 6 (2015),222:1–222:12. 3[LZC11] L EE Y. J., Z
ITNICK
C. L., C
OHEN
M. F.: Shadowdraw: Real-time user guidance for freehand drawing.
ACM Trans. Graph. 30 , 4(July 2011), 27:1–27:10. URL: http://doi.acm.org/10.1145/2010324.1964922 , doi:10.1145/2010324.1964922 . 3[LZX ∗
08] L IU L., Z
HANG
L., X U Y., G
OTSMAN
C., G
ORTLER
S. J.: Alocal/global approach to mesh parameterization. In
Computer GraphicsForum (2008), vol. 27, Wiley Online Library, pp. 1495–1504. 6[MWYG20] M O K., W
ANG
H., Y AN X., G
UIBAS
L.: PT2PC: Learn-ing to generate 3d point cloud shapes from part tree conditions. arXivpreprint arXiv:2003.08624 (2020). 2[MZL ∗
09] M
EHRA
R., Z
HOU
Q., L
ONG
J., S
HEFFER
A., G
OOCH
A.,M
ITRA
N. J.: Abstraction of man-made shapes.
ACM Transactions onGraphics 28 , 5 (2009),
IEMEYER
M., M
ESCHEDER
L., O
ECHSLE
M., G
EIGER
A.: Differentiable volumetric rendering: Learning implicit 3d represen-tations without 3d supervision. arXiv preprint arXiv:1912.07372 (2019).2[NPLT ∗
19] N
GUYEN -P HUOC
T., L I C., T
HEIS
L., R
ICHARDT
C.,Y
ANG
Y.-L.: Hologan: Unsupervised learning of 3d representationsfrom natural images. In
Proceedings of the IEEE International Con-ference on Computer Vision (2019), pp. 7588–7597. 2[OTW ∗
19] O
LSZEWSKI
K., T
ULYAKOV
S., W
OODFORD
O., L I H.,L UO L.: Transformable bottleneck networks. In
Proceedings of the IEEEInternational Conference on Computer Vision (2019), pp. 7648–7657. 2[PXW18] P
ENG
M., X
ING
J., W EI L.-Y.: Autocomplete 3d sculpting.
ACM Trans. Graph. 37 , 4 (July 2018), 132:1–132:15. URL: http://doi.acm.org/10.1145/3197517.3201297 , doi:10.1145/3197517.3201297 . 3[PYY ∗
17] P
ARK
E., Y
ANG
J., Y
UMER
E., C
EYLAN
D., B
ERG
A. C.:Transformation-grounded image generation network for novel 3d viewsynthesis. In
Proceedings of the ieee conference on computer vision andpattern recognition (2017), pp. 3500–3509. 2[PZ17] P
ENNER
E., Z
HANG
L.: Soft 3d reconstruction for view synthe-sis. 2[SBS19] S
MIRNOV
D., B
ESSMELTSEV
M., S
OLOMON
J.: Deep sketch-based modeling of man-made shapes.
CoRR abs/1906.12337 (2019).URL: http://arxiv.org/abs/1906.12337 , arXiv:1906.12337 . 2[SCD ∗
06] S
EITZ
S. M., C
URLESS
B., D
IEBEL
J., S
CHARSTEIN
D.,S
ZELISKI
R.: A comparison and evaluation of multi-view stereo recon-struction algorithms. In (2006), vol. 1, IEEE,pp. 519–528. 2[SHL ∗
18] S UN S.-H., H UH M., L
IAO
Y.-H., Z
HANG
N., L IM J. J.:Multi-view to novel view: Synthesizing novel views with self-learnedconfidence. In
European Conference on Computer Vision (2018). 2[SIJ ∗
07] S
CHMIDT
R., I
SENBERG
T., J
EPP
P., S
INGH
K., W
YVILL
B.: Sketching, scaffolding, and inking: A visual history for in-teractive 3d modeling. In
NPAR ’07: Proceedings of the 5th in-ternational symposium on Non-photorealistic animation and render-ing (2007), pp. 23–32. URL: . 3[SKSK09] S
CHMIDT
R., K
HAN
A., S
INGH
K., K
URTENBACH
G.: An-alytic drawing of 3d scaffolds.
ACM Transactions on Graphics 28 , 5(2009). Proceedings of SIGGRAPH ASIA 2009. URL: . 3 submitted to COMPUTER GRAPHICS
Forum (10/2020). I-Chao Shen et al. / ClipFlip : Multi-view Clipart Design [SLHC12] S
HEN
L.-T., L UO S.-J., H
UANG
C.-K., C
HEN
B.-Y.: Sdmodels: Super-deformed character models.
Computer Graphics Forum31 , 7 (2012), 2067–2075. (Pacific Graphics 2012 Conference Proceed-ings). 3[Ste03] S
TEPIN
M.: Hqx, 2003. URL: . 3[TDB16] T
ATARCHENKO
M., D
OSOVITSKIY
A., B
ROX
T.: Multi-view3d models from single images with a convolutional network. In
Euro-pean Conference on Computer Vision (ECCV) (2016). 2[TZEM17] T
ULSIANI
S., Z
HOU
T., E
FROS
A. A., M
ALIK
J.: Multi-view supervision for single-view reconstruction via differentiable rayconsistency. In
Computer Vision and Pattern Regognition (CVPR) (2017). 2[Vec20] V
ECTOR M AGIC : Cedar lake ventures, 2020. URL: https://vectormagic.com/ . 3[VK78] V
ANDENBERG
S. G., K
USE
A. R.: Mental rotations, a group testof three-dimensional spatial visualization.
Perceptual and motor skills47 , 2 (1978), 599–604. 2[XCW14] X
ING
J., C
HEN
H.-T., W EI L.-Y.: Autocomplete paint-ing repetitions.
ACM Trans. Graph. 33 , 6 (Nov. 2014), 172:1–172:11. URL: http://doi.acm.org/10.1145/2661229.2661247 , doi:10.1145/2661229.2661247 . 3[XKG ∗
16] X
ING
J., K
AZI
R. H., G
ROSSMAN
T., W EI L.-Y., S
TAM
J., F
ITZMAURICE
G.: Energy-brushes: Interactive tools for illustratingstylized elemental dynamics. In
Proceedings of the 29th Annual Sympo-sium on User Interface Software and Technology (New York, NY, USA,2016), UIST ’16, ACM, pp. 755–766. URL: http://doi.acm.org/10.1145/2984511.2984585 , doi:10.1145/2984511.2984585 . 3[YK12] Y UMER
M. E., K
ARA
L. B.: Co-abstraction of shape collec-tions.
ACM Trans. Graph. 31 , 6 (Nov. 2012). URL: https://doi.org/10.1145/2366145.2366185 , doi:10.1145/2366145.2366185 . 3[ZTS ∗
16] Z
HOU
T., T
ULSIANI
S., S UN W., M
ALIK
J., E
FROS
A. A.:View synthesis by appearance flow. In
European Conference on Com-puter Vision (2016). 2 submitted to COMPUTER GRAPHICS