14
Learning Manifold Patch-Based Representations of Man-Made Shapes DMITRIY SMIRNOV, Massachuses Institute of Technology MIKHAIL BESSMELTSEV, Université de Montréal JUSTIN SOLOMON, Massachuses Institute of Technology Choosing the right shape representation for geometry is crucial for making 3D models compatible with existing applications. Focusing on piecewise- smooth man-made shapes, we propose a new representation that is usable in conventional CAD modeling pipelines and can also be learned by deep neural networks. We demonstrate the benefits of our representation by applying it to the task of sketch-based modeling. Given a raster image, our system infers a set of parametric surfaces that realize the input in 3D. To capture the piecewise smooth geometry of man-made shapes, we learn a special shape representation: a deformable parametric template composed of Coons patches. Naïvely training such a system, however, would suffer from non-manifold artifacts of the parametric shapes as well as from a lack of data. To address this, we introduce loss functions that bias the network to output non-self-intersecting shapes and implement them as part of a fully self-supervised system, automatically generating both shape templates and synthetic training data. To test the efficacy of our system, we develop a testbed for sketch-based modeling and show results on a gallery of synthetic and real artist sketches. As additional applications, we also demonstrate shape interpolation and provide comparison to related work. CCS Concepts: Computing methodologies Parametric curve and surface models; Neural networks. Additional Key Words and Phrases: Sketch-based Modeling, Deep Learning 1 INTRODUCTION Recent advances in deep learning have resulted in systems capable of producing 3D geometry in a variety of formats. While state-of- the-art methods that output point clouds, triangle meshes, voxel grids, and implicitly defined surfaces can yield detailed results, these representations are dense, high-dimensional, and not easily compat- ible with existing CAD modeling pipelines. In this work, we focus on developing a 3D representation that is parsimonious, geometri- cally interpretable, and easily editable with standard tools while at the same time being compatible with deep learning. Our choice of representations enables a shape modeling system that leverages the ability of deep neural networks to process incomplete, ambiguous input data and produces useful, consistent 3D output. To demonstrate our representation on a model problem, we present a deep learning-based system to infer a complete man-made 3D shape from one or more bitmap inputs. Our system infers a network of parametric surfaces that realize the drawing in 3D. The com- ponent surfaces, parameterized by their control points, are linked in a manifold fashion and allow for easy modification in conven- tional shape editing software as well as conversion to a manifold mesh. Our primary technical contributions involve the develop- ment of machinery for learning parametric 3D surfaces in a fashion that is efficiently compatible with modern deep learning pipelines and effective for a challenging 3D modeling task. Our algorithm automatically infers shape templates for different categories and incorporates a number of loss functions that operate directly on the geometry rather than in the parametric domain or on a grid sampling of surrounding space. Extending learning methodologies Fig. 1. Given a bitmap sketch of a man-made shape, our method automati- cally infers a complete manifold parametric 3D model, ready to be edited, rendered, or converted to a mesh. Compared to conventional methods, our resolution-independent parsimonious shape representation allows us to faithfully reconstruct sharp features (wing and tail edges) as well as smooth regions. from images and data points to more exotic modalities like networks of surface patches is a central theme of modern graphics, vision, and learning research, and we anticipate broad application of these technical developments as fundamental tools in CAD workflows. In order to test our novel system, we choose sketch-based model- ing as a model problem and target application. Converting rough, incomplete 2D input into a clean, complete 3D shape is extremely ill-posed, requiring hallucination of missing parts and interpreta- tion of noisy signal. To cope with these ambiguities, most systems rely on hand-designed shape priors. This approach severely limits the applications of those methods. Each shape category requires its own expert-designed prior, and many shape categories do not admit obvious means of regularizing the reconstruction process. As an alternative, a few recent papers explore the possibility of learning the shapes from data, implicitly inferring the relevant shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang, Wang, Qian, and Fang Wang et al.], but their output models often lack resolution and sharp features necessary for high-quality 3D modeling. In more detail, most sketch-based modeling algorithms target natural shapes like humans and animals [Bessmeltsev et al. 2015; Entem et al. 2015; Igarashi et al. 1999], which are typically smooth. To aid shape reconstruction, these systems regularize their objective functions to promote smoothness of the reconstructed shape; repre- sentations like generalized cylinders are chosen to optimize in the space of smooth surfaces [Bessmeltsev et al. 2015; Entem et al. 2015]. This, however, does not apply to the focus of our work: man-made shapes. These objects, like planes or espresso machines, are only piecewise smooth and hence do not satisfy the assumptions of many sketch-based modeling systems. In industrial design, man-made shapes are typically modeled using collections of smooth parametric patches, such as NURBS surfaces, with patch boundaries forming the sharp features. To learn such shapes effectively, we leverage this structure by using a special shape representation, a deformable parametric template [Jain et al. 1998]. This template is a manifold surface composed of arXiv:1906.12337v2 [cs.GR] 31 May 2020

Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes

DMITRIY SMIRNOV, Massachusetts Institute of TechnologyMIKHAIL BESSMELTSEV, Université de MontréalJUSTIN SOLOMON, Massachusetts Institute of Technology

Choosing the right shape representation for geometry is crucial for making3D models compatible with existing applications. Focusing on piecewise-smooth man-made shapes, we propose a new representation that is usablein conventional CAD modeling pipelines and can also be learned by deepneural networks. We demonstrate the benefits of our representation byapplying it to the task of sketch-based modeling. Given a raster image, oursystem infers a set of parametric surfaces that realize the input in 3D. Tocapture the piecewise smooth geometry of man-made shapes, we learn aspecial shape representation: a deformable parametric template composedof Coons patches. Naïvely training such a system, however, would sufferfrom non-manifold artifacts of the parametric shapes as well as from a lackof data. To address this, we introduce loss functions that bias the network tooutput non-self-intersecting shapes and implement them as part of a fullyself-supervised system, automatically generating both shape templates andsynthetic training data. To test the efficacy of our system, we develop atestbed for sketch-based modeling and show results on a gallery of syntheticand real artist sketches. As additional applications, we also demonstrateshape interpolation and provide comparison to related work.

CCS Concepts: • Computing methodologies → Parametric curve andsurface models; Neural networks.

Additional Key Words and Phrases: Sketch-based Modeling, Deep Learning

1 INTRODUCTIONRecent advances in deep learning have resulted in systems capableof producing 3D geometry in a variety of formats. While state-of-the-art methods that output point clouds, triangle meshes, voxelgrids, and implicitly defined surfaces can yield detailed results, theserepresentations are dense, high-dimensional, and not easily compat-ible with existing CAD modeling pipelines. In this work, we focuson developing a 3D representation that is parsimonious, geometri-cally interpretable, and easily editable with standard tools while atthe same time being compatible with deep learning. Our choice ofrepresentations enables a shape modeling system that leverages theability of deep neural networks to process incomplete, ambiguousinput data and produces useful, consistent 3D output.

To demonstrate our representation on amodel problem,we presenta deep learning-based system to infer a complete man-made 3Dshape from one or more bitmap inputs. Our system infers a networkof parametric surfaces that realize the drawing in 3D. The com-ponent surfaces, parameterized by their control points, are linkedin a manifold fashion and allow for easy modification in conven-tional shape editing software as well as conversion to a manifoldmesh. Our primary technical contributions involve the develop-ment of machinery for learning parametric 3D surfaces in a fashionthat is efficiently compatible with modern deep learning pipelinesand effective for a challenging 3D modeling task. Our algorithmautomatically infers shape templates for different categories andincorporates a number of loss functions that operate directly onthe geometry rather than in the parametric domain or on a gridsampling of surrounding space. Extending learning methodologies

Fig. 1. Given a bitmap sketch of a man-made shape, our method automati-cally infers a complete manifold parametric 3D model, ready to be edited,rendered, or converted to a mesh. Compared to conventional methods, ourresolution-independent parsimonious shape representation allows us tofaithfully reconstruct sharp features (wing and tail edges) as well as smoothregions.

from images and data points to more exotic modalities like networksof surface patches is a central theme of modern graphics, vision,and learning research, and we anticipate broad application of thesetechnical developments as fundamental tools in CAD workflows.

In order to test our novel system, we choose sketch-based model-ing as a model problem and target application. Converting rough,incomplete 2D input into a clean, complete 3D shape is extremelyill-posed, requiring hallucination of missing parts and interpreta-tion of noisy signal. To cope with these ambiguities, most systemsrely on hand-designed shape priors. This approach severely limitsthe applications of those methods. Each shape category requires itsown expert-designed prior, and many shape categories do not admitobvious means of regularizing the reconstruction process. As analternative, a few recent papers explore the possibility of learningthe shapes from data, implicitly inferring the relevant shape priors[Delanoy et al. 2018; Lun et al. 2017; Wang, Wang, Qian, and FangWang et al.], but their output models often lack resolution and sharpfeatures necessary for high-quality 3D modeling.In more detail, most sketch-based modeling algorithms target

natural shapes like humans and animals [Bessmeltsev et al. 2015;Entem et al. 2015; Igarashi et al. 1999], which are typically smooth.To aid shape reconstruction, these systems regularize their objectivefunctions to promote smoothness of the reconstructed shape; repre-sentations like generalized cylinders are chosen to optimize in thespace of smooth surfaces [Bessmeltsev et al. 2015; Entem et al. 2015].This, however, does not apply to the focus of our work: man-madeshapes. These objects, like planes or espresso machines, are onlypiecewise smooth and hence do not satisfy the assumptions of manysketch-based modeling systems.In industrial design, man-made shapes are typically modeled

using collections of smooth parametric patches, such as NURBSsurfaces, with patch boundaries forming the sharp features. Tolearn such shapes effectively, we leverage this structure by usinga special shape representation, a deformable parametric template[Jain et al. 1998]. This template is a manifold surface composed of

arX

iv:1

906.

1233

7v2

[cs

.GR

] 3

1 M

ay 2

020

Page 2: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

2 • Smirnov, Bessmeltsev, and Solomon

patches, where each patch is parameterized by its control points;example patches include Bézier patches [Farin 2002] and Coonspatches [Coons 1967] (Fig. 6(a)). This representation enables us tocontrol the smoothness of each patch while allowing the model tointroduce sharp edges between patches where necessary.

Compared to traditional representations, deformable parametrictemplates have numerous benefits for our task. They are intuitiveto edit with conventional software, are resolution-independent, andcan be meshed to arbitrary accuracy. Furthermore, since typicallyonly boundary control points are needed, our surface representa-tion has relatively few parameters to learn and store. Finally, thisstructure admits closed-form expressions for normals and othergeometric features, which can be used to construct loss functionsthat improve reconstruction quality (§3.2).More importantly, beyond defining the connectivity of the final

shape, our deformable template acts as a strong initial guess thatdrives the learning towards a better local minimum. Comparedto a generic template, this prealigned category-specific templateimproves reconstruction of small details and sharp features of themodel.The core of our system is a CNN-based architecture to infer the

coordinates of control points of a deformable template, algorithmi-cally generated for a given shape category by a novel method. Anaïve attempt to develop and train such networks faces three majorchallenges: the difficulty of detecting non-manifold surfaces, andstructural variations within a shape category, and the lack of data.We address these challenges as follows:

• We introduce several loss functions that encourage our patch-based output to form a manifold mesh without topological arti-facts or self-intersections.

• Deformable templates are a natural choice for objects with fixedstructure, such as cups or guitars. However, some categories ofman-made shapes exhibit structural variation. To address this, foreach category we algorithmically generate a varying deformabletemplate, which allows us to separate structural variation using avariable number of parts (Sec. 3.1.2), which we demonstrate onmodular turbines on airplanes.

• Supervised methods mapping from sketches to 3D models re-quire a database of sketch-model pairs, and, to-date, there areno such large-scale repositories. We introduce a synthetic sketchaugmentation pipeline that uses insights from the artistic litera-ture to simulate possible variations observed in natural drawings(§4.1). Although our model is trained on synthetic sketches, itgeneralizes to natural sketches (Fig. 18).

Contributions. Our key technical contributions include learninga new geometric representation, a novel method to automaticallygenerate a template for a given collection of shapes, and new lossterms preventing non-manifold surfaces. We present a system forpredicting parametric manifold surfaces of models of man-made3D shapes using deep learning. Our method is fully self-supervised;while we predict patch parameters, none of our data is labeledwith ground truth patch decompositions, and our templates canbe generated in a completely automatic manner. We validate byshowing applications to sketch-based modeling, with a gallery of

results on both synthetic and natural sketches from various artists,as well as interpolation to generate novel 3D models.

2 RELATED WORKOur work introduces a new 3D representation as a significant steptowards bridging the gap between modern progress in deep learningand long-standing problems in CAD modeling. To give a rough ideaof the landscape of available methods, we briefly summarize relatedwork in deep learning and sketch-based modeling.

2.1 Deep learning for shape reconstructionLearning to reconstruct 3D geometry from various input modalitieshas recently enjoyed significant research interest. Typical forms ofinput are images [Choy et al. 2016; Delanoy et al. 2018; Gao et al.2019; Häne et al. 2019; Wang, Wang, Qian, and Fang Wang et al.;Wu et al. 2017; Yan et al. 2016] and point clouds [Groueix et al. 2018;Park et al. 2019; Williams et al. 2019]. When designing a networkfor this task, two considerations affect the architecture: the lossfunction and the geometric representation.

Loss Functions. One promising and popular direction employsa differentiable renderer and measures 2D image loss between arendering of the inferred 3D model and the input image, often called2D-3D consistency or silhouette loss [Kato et al. 2018; Rezende et al.2016; Tulsiani et al. 2018, 2017c; Wu et al. 2017, 2016a; Yan et al.2016]. A notable example is the work by Wu et al. [2017], whichlearns a mapping from a photograph to a normal map, a depth map,a silhouette, and the mapping from these outputs to a voxelization.They use a differentiable renderer andmeasure inconsistencies in 2D.2D losses are powerful in computer vision. Hand-drawn sketches,however, cannot be interpreted as perfect projections of 3D objects:They are imprecise and often inconsistent [Bessmeltsev et al. 2016].Another approach uses 3D loss functions, measuring discrepanciesbetween the predicted and target 3D shapes directly, often via Cham-fer or a regularized Wasserstein distance [Gao et al. 2019; Groueixet al. 2018; Liu et al. 2010; Mandikal et al. 2018; Park et al. 2019;Williams et al. 2019], or—in the case of highly-structured represen-tations such as voxel grids—cross-entropy [Häne et al. 2019]. Webuild on this work, adapting the Chamfer distance to patch-basedgeometric representations and extending the loss function with newregularizers (§3.2).

Shape representation. As noted by Park et al. [2019], geometricrepresentations in deep learning broadly can be divided into threeclasses: voxel-based representations, point-based representations,and mesh-based representations.

The most popular approach is to use voxels, directly reusing suc-cessful methods for 2D images [Choy et al. 2016; Delanoy et al.2018; Tulsiani et al. 2018; Wang, Wang, Qian, and Fang Wang et al.;Wang et al. 2018a; Wu et al. 2017, 2018; Yan et al. 2016; Zhang et al.2018; Zhirong Wu et al. 2015]. The main limitation of voxel-basedmethods is low resolution due to memory limitations. Octree-basedapproaches mitigate this problem [Häne et al. 2019; Wang et al.

Page 3: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 3

Fig. 2. Editing a 3D model produced by our method. Because we output 3D geometry as a collection of consistent, well-placed NURBS patches, user edits canbe made in conventional CAD software by simply moving control points. Here, we are able to refine the trunk of a car model with just a few clicks.

2017], learning shapes at up to 5123 resolution, but even this den-sity is insufficient to produce visually convincing surfaces. Further-more, voxelized approaches cannot directly represent sharp features,which are key for man-made shapes.

Point-based approaches represent 3D geometry as a point cloud[Fan et al. 2017; Lun et al. 2017; Mandikal et al. 2018; Tatarchenkoet al. 2016; Yang et al. 2018], sidestepping the memory issues. Thoserepresentations, however, do not capture connectivity. Hence, theycannot guarantee production of manifold surfaces.Some recent methods use mesh-based representations [Bagaut-

dinov et al. 2018; Baque et al. 2018; Kanazawa et al. 2018; Litanyet al. 2018; Wang et al. 2019], representing shapes using deformablemeshes. We take inspiration from this approach to reconstruct asurface by deforming a template, but our parametric template repre-sentation allows us tomore easily enforce piecewise smoothness andtest for self-intersections (§3.2). These properties are difficult to mea-sure on meshes in a differentiable manner. Compared to a generictemplate shape, such as sphere, our category-specific templates im-prove the reconstruction quality, and enable complex reconstructionconstraints, e.g., symmetry. We further compare to the deformablemesh representations in Sec. 4.5. Other mesh-based methods eitheruse a precomputed parameterization to a domain on which it isstraightforward to apply CNN-based architectures [Haim et al. 2019;Maron et al. 2017; Sinha et al. 2016] or learn a parameterizationdirectly [Ben-Hamu et al. 2018; Groueix et al. 2018]. Even thoughthese methods are not specifically designed for sketch-based model-ing, for completeness, we compare our results to one of the morepopular methods, AtlasNet [Groueix et al. 2018] (Fig. 23).Most importantly, our man-made shape representation is native

to modern CAD software, such as Autodesk Fusion 360, Rhino, andSolidworks, and it can be directly exported and edited in this soft-ware, as demonstrated in Figure 2. The key to this flexibility is thetype of the parametric patches we use, bilinearly blended Coonspatches, which belong to the family of NURBS surfaces and canbe trivially converted to a NURBS representation [Piegl and Tiller1996], the standard surface type in CAD. The other common shaperepresentations, such as meshes or point clouds, cannot be easilyconverted into NURBS format: algorithmically fitting NURBS sur-faces is nontrivial and is an active area of research [Krishnamurthyand Levoy 1996; Yumer and Kara 2012].Finally, a few works explore less common representations, such

as signed distance functions [Mescheder et al. 2019], implicit fields[Chen and Zhang 2019], implicit surfaces [Genova et al. 2019], shapeprograms [Tian et al. 2019], splines [Gao et al. 2019], volumetricprimitives [Tulsiani et al. 2017a; Zou et al. 2017], and elements of alearned latent space [Achlioptas et al. 2017; Wu et al. 2016b]. These

papers demonstrate impressive reconstruction results, but eitherdo not aim to produce an expressive complete 3D model [Gao et al.2019; Tian et al. 2019; Tulsiani et al. 2017a; Zou et al. 2017] or arenot tuned to CAD applications [Achlioptas et al. 2017; Chen andZhang 2019; Genova et al. 2019; Mescheder et al. 2019; Wu et al.2016b]. It is unclear how these representations can be successfullyused for generating editable CAD shape representations.A few deep learning algorithms address sketch-based modeling

[Delanoy et al. 2018; Huang et al. 2017; Li et al. 2018; Lun et al.2017; Nishida et al. 2016; Wang, Wang, Qian, and Fang Wang et al.].Nishida et al. [2016] and Huang et al. [2017] train networks to pre-dict procedural model parameters that yield detailed shapes from asketch. These methods produce complex high-resolution models,but only for the classes of shapes that can be procedurally gener-ated, such as trees or buildings. Lun et al. [2017] use a CNN-basedencoder-decoder architecture to predict multi-view depth and nor-mal maps, later converted to point clouds. Li et al. [2018] improveon their results by first predicting a flow field from an annotatedsketch of an organic smooth shape, later converted to a depth map.In contrast, we output a deformable parametric template, whichcan be directly, without post-processing, converted to a manifoldmesh. Wang, Wang, Qian, and Fang [Wang et al.] learn from unla-beled databases of sketches and 3D models with no correspondencebetween them. They train two networks: The first network is aGAN with an autoencoder-based discriminator aimed to embedboth natural sketches and renders into a latent space with matchingdistributions. The second network is a CNN mapping the latentvector into a voxelization, trained on renders only. Another inspi-ration for our research is the work of Delanoy et al. [2018], whichreconstructs a 3D object, represented as voxelization, given sketchesdrawn from multiple views. We compare our results to [Delanoyet al. 2018; Lun et al. 2017] in Fig. 21.

2.2 Sketch-based 3D shape modelingReconstructing 3D geometry from sketches has a long history incomputer graphics. A complete survey of sketch-based modeling isbeyond the scope of this paper; an interested reader may refer tothe recent paper by Delanoy et al. [2018] or surveys by Ding andLiu [2016] and Olsen et al. [2009]. Here, we mention the work mostrelevant to our approach.

Many sketch-based 3D shape modeling systems are incremental,i.e., they allow users to model shapes by progressively adding newstrokes, updating the 3D shape after each action. Such systems maybe designed as single-view interfaces, where the user is often re-quired to manually annotate each stroke [Chen et al. 2013; Cherlinet al. 2005; Gingold et al. 2009; Shtof et al. 2013], or they may allow

Page 4: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

4 • Smirnov, Bessmeltsev, and Solomon

strokes to be added to multiple views [Igarashi et al. 1999; Nealenet al. 2007; Tai et al. 2004]. These systems can cope with considerablegeometric complexity, but their dependence on the ordering of thestrokes forces artists to deviate from standard approaches to sketch-ing. In contrast, our machine learning method allows the systemto interpret complete sketches, eliminating training for artists touse our system and enabling 3D reconstruction of legacy sketches.Similarly, while Xu et al. [2014] present a single-view 3D curve net-work reconstruction system for man-made shapes that can produceimpressive sharp results, they process specialized design sketches,consisting of cross-sections, output only a curve network, and relyon user annotations. Our system produces complete 3D shapes fromnatural sketches with no extra annotation.A variety of systems interpret complete 2D sketches with no

extra information. This species of input is extremely ambiguousthanks to hidden surfaces and noisy sketch curves, and hence recon-struction algorithms rely on strong 3D shape priors. These priorsare typically manually created. For example, priors for humanoidcharacters, animals, and natural shapes promote smooth, round,and symmetrical shapes [Bessmeltsev et al. 2015; Entem et al. 2015;Igarashi et al. 1999], while garments are typically regularized to be(piecewise-)developable [Jung et al. 2015; Li et al. 2017, 2018; Robsonet al. 2011; Turquin et al. 2004; Zhu et al. 2013]; man-made shapesare often approximated as combinations of geometric primitives[Shao et al. 2016] or as unions of nearly-flat faces [Yang et al. 2013].Our work focuses on man-made shapes, which have characteristicsharp edges and are only piecewise smooth rather than developable.We use a learned deformable patch template to promote shapeswith this structure (§3.1). Moreover, introducing specific expert-designed priors can be challenging: Man-made shapes are varied,diverse, and complex (Fig. 1, 9-18). Instead, we automatically learna category-specific shape prior from data.

Most sketch-basedmodeling interfaces process vector input, whichconsists of a set of clean curves [Bessmeltsev et al. 2015, 2016; Entemet al. 2015; Jung et al. 2015; Li et al. 2017, 2018; Xu et al. 2014]. Thisapproach is acceptable for tablet-based interfaces, but it forces usersto deviate from their preferred drawing media. Paper-and-pencilsketches still remain a preferred means of capturing shape. Whilethey can be vectorized and cleaned using modern methods [Bess-meltsev and Solomon 2019; Simo-Serra et al. 2018], preprocessingcan introduce unnecessary distortions and errors, leading to sub-optimal reconstruction. In contrast, our system directly processesbitmap sketches.

3 ALGORITHMWe engineer a deep learning pipeline that outputs a parametrically-defined 3D surface. We describe the geometric representation ofthe output surfaces (§3.1), define the loss terms that we optimize(§3.2), and specify the deep CNN architecture and training procedure(§3.3).

3.1 Representation3.1.1 Patch Primitives. We would like to encode 3D surfaces witha compact and expressive representation. To capture the details ofman-made shapes, our representation must be capable of containing

smooth regions as well as sharp creases and corners. Given theserequirements, we represent our surfaces as collections of parametricprimitives, where each primitive is a Coons patch [Coons 1967].

A Coons patch is a parametric surface patch in three dimensionsspecified by four boundary curves sharing endpoints. We choseeach boundary curve to be a cubic Bézier curve, c(γ ), specified byfour control points p1,p2,p3,p4 ∈ R3, two of which, p1 and p4, areconnected to adjacent curves. Thus, our patches are parameterizedby 12 control points in total.

A single Bézier curve c : [0, 1] → R3 is defined as

c(γ ) = p1(1 − γ )3 + 3p2γ (1 − γ )2 + 3p3γ 2(1 − γ ) + p4γ 3, (1)

and a Coons patch P : [0, 1] × [0, 1] → R3 is defined as

P(s, t) = (1 − t)c1(s) + tc3(1 − s) + sc2(t) + (1 − s)c4(1 − t)− (c1(0)(1 − s)(1 − t) + c1(1)s(1 − t) + c3(1)(1 − s)t + c3(0)) st .

(2)

3.1.2 Templates. We use templates to specify the connectivity ofa collection of Coons patches. A template consists of the minimalnumber of control points necessary to define the Coons patchesfor the entire surface; control points for adjacent patches sharingboundary curves or corners are reused rather than duplicated. Forinstance, we can define a template with cube topology based on aquad mesh with six faces; the resulting template contains 12 sharedcurves and 32 control points.

d

cd

We allow for the edge of one patch tobe contained within the edge of anotherwithout subdividing either patch by usingjunction curves. A junction curve cd is con-strained to a lie along a parent curve d andis thus parameterized by s, t ∈ [0, 1], suchthat c(0) = d(s) and c(1) = d(t). We mustbe careful when defining junction curves so that each endpoint ofthe junction curve is well-defined in terms a single parent curve.We address this in detail below.

A template provides hard topological constraints for our surfacesas well as an initialization of their geometry and, optionally, a meansfor geometric regularization. Templates are crucial in ensuring thatour predicted patches have consistent topology—an approach with-out templates would result in unstructured patch collections, withpatches that do not align at boundaries or form a watertight, mani-fold surface.While we demonstrate that our method works using a generic

sphere template, we optionally can define distinct templates for dif-ferent shape categories to incorporate category-specific geometricpriors. These templates capture only coarse geometric features andapproximate scale. We outline a strategy for obtaining templatesbelow.

Algorithmic construction of templates. We design a simple systemto construct a template automatically given as input a collectionof cuboids. Such a collection of cuboids can be computed auto-matically for a shape category, e.g., given a segmentation or usingself-supervised methods such as [Smirnov et al. 2020; Sun et al. 2019;Tulsiani et al. 2017b], or easily produced manually using standardCAD software. Our algorithms converts any collection of cuboids

Page 5: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 5

(a) (b) (c)

Fig. 3. A summary of our agglomerative algorithm for automatic templategeneration. Given any collection of cuboids (a), we first split the quad facesand remove interior and overlapping faces to obtain a valid quad mesh (b),and iteratively merge adjacent faces to obtain the final template (c).

into a template compatible with our method. In our experiments,we show templates algorithmically computed from pre-segmentedshapes—for a given shape, we obtain a collection of cuboids by tak-ing the bounding box around each connected component of eachsegmentation class.While a cuboid decomposition may be a good approximation

of a 3D model, the cuboids may overlap and thus cannot be usedas a template. We first snap our cuboids to an integer lattice andthen refine the decomposition by splitting each cuboid face at everycoordinate of the grid. We remove overlapping and interior facesto obtain a quad mesh. While the resulting quad mesh can be useddirectly as a template, it typically consists of a large number of faces,and thus we further process it.

We simplify our quad mesh by merging adjacent quads, ensuringthat junction curves are well-defined and that there are no circu-lar definitions. We do this with a greedy agglomerative algorithm,iterating over each quad in order of descending area and mergingit with an adjacent quad as long as the merge does not result inany ill-defined curves. To keep track of how junction curves aredefined, we use a quad dependency graph. The graph contains a anode for each quad and a directed edge from node A to node B if acurve of quad B is a junction curve whose parent is a side of A. Thisstructure allows us to determine whether a merge is impermissible—if the resulting dependency graph contains a cycle, or some nodeis the child of two parents that do not share a graph edge, we donot merge. We continue iterating over quads until no permissiblemerges remain. Then, the order in which we must define junctionsis simply a topological ordering of the dependency graph. We showa example cuboid input, intermediate construction, and final outputof this algorithm in Figure 3.Given cuboid decompositions of multiple shapes in a category,

we find the median model in the category with respect to Chamferdistance. Since, in the datasets used for our experiments, modelswithin a shape category are generally aligned and normalized, themedian provides a rough approximation of the typical geometry.

Structural variation using templates. For category-specific tem-plates, we use the fact that template patches are consistently placedon semantically meaningful components of the shape to accountfor structural variation doing training. For instance, in the airplanesshape category, certain models contain turbines while others do not.

Fig. 4. Structural variation. When using a template, since patches aremapped consistently across inputs, we can choose to toggle modular com-ponents by simply showing or hiding certain patches. Here, we demonstratethe same airplane model with and without turbines. Both configurationsproduce manifold meshes.

(a) (b)

Fig. 5. Our geometry representation is composed of Coons patches (a) thatare organized into a deformable template (b).

Fig. 6. The templates used in our experiments. From top to bottom, left toright: bottle, knife, guitar, car, airplane, coffee mug, gun, bathtub, 24-patchsphere, 54-patch sphere.

When constructing the airplane template, we note which patchescome from cuboids corresponding to turbines, and, during train-ing, only sample from the turbine patches for models that containturbines. This allows to train the entire airplane shape category,effectively using two distinct templates. Additionally, at test time,we can toggle turbines on or off for any given input, as shown inFigure 4.

3.2 LossIn our training procedure, we fit a collection of Coons patches {Pi }to a target mesh M by optimizing a differentiable loss function.Below, we describe each term of our loss—a main reconstructionloss analogous to Chamfer distance (§3.2.1), a normal alignment loss(§3.2.2), a regularizer to inhibit self-intersections (§3.2.3), a patchflatness regularizer (§3.2.4), and two template-based priors (§3.2.5and §3.2.6).

3.2.1 Area-weighted Chamfer distance. Given twomeasurable shapesA,B ⊂ R3 and point setsX andY sampled fromA andB, respectively,the directed Chamfer distance between X and Y is

Chdir(X ,Y ) =1|X |

∑x ∈X

miny∈Y

d(x ,y), (3)

Page 6: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

6 • Smirnov, Bessmeltsev, and Solomon

where d(x ,y) is Euclidean distance between x and y. The symmetricChamfer distance is

Ch(X ,Y ) = Chdir(X ,Y ) + Chdir(Y ,X ). (4)

Chamfer distance is differentiable and therefore a popular lossfunction in deep learning pipelines that optimize shapes (§2.1). Itsuffers from several disadvantages, however. In particular, the dis-tribution under which X and Y are sampled from A and B has asignificant impact on the Chamfer distance; sampling in the para-metric domain does not capture the area measure of the surface.In our setting, sampling uniformly from Coons patches is difficult,while sampling uniformly from the parametric domain results inoversampling around regions with high curvature.To address this sampling issue, following Smirnov et al. [2020],

we first define the variational directed Chamfer distance, startingfrom (3):

Chdir(X ,Y ) =1|X |

∑x ∈X

miny∈Y

d(x ,y) (5)

≈ Ex∼UA

[infy∈Y

d(x ,y)]

(6)

≈ 1Area(X )

∫X

infy∈Y

d(x ,y) dx def= Chvardir (A,B), (7)

whereUA is the uniform distribution on A. Variational symmetricChamfer distance Chvar(A,B) is defined analogously.

We leverage the fact that, while it is difficult to sample uniformlyfrom our parametric patches, we are able to sample uniformly fromtheir parametric domain (i.e., the unit square) in a straightforwardfashion. Thus, we perform a change of variables:

Chvardir (P ,M) = (8)

=1

Area(P)

∫P

infy∈M

d(x ,y) dx (9)

=1

Area(P)

1 1∬0 0

infy∈M

d (P(s, t),y) |J (s, t)| ds dt (10)

=1

Area(P) ·1

Area(□)

1 1∬0 0

infy∈M

d (P(s, t),y) |J (s, t)| ds dt (11)

=1

Area(P) E(s,t )∼U□

[infy∈M

d(P(s, t),y)|J (s, t)|]

(12)

=E(s,t )∼U□

[infy∈M d(P(s, t),y)|J (s, t)|

]E(s,t )∼U□

[|J (s, t)|] , (13)

where □ = [0, 1] × [0, 1] and J (s, t) is the Jacobian of P(s, t). Inpractice, we approximate this value via Monte Carlo integration:

Chvardir (P ,M) ≈1

|U□ |∑(s,t )∈U□

miny∈M d(P(s, t),y)|J (s, t)|1

|U□ |∑(s,t )∈U□

|J (s, t)|(14)

=

∑(s,t )∈U□

miny∈M d(P(s, t),y)|J (s, t)|∑(s,t )∈U□

|J (s, t)| , (15)

whereU□ is a set of points uniformly sampled from the unit square.

Since we can precompute uniformly sampled random points fromthe target mesh, we do not need to use area weights to computeChvardir (M, P). Thus, our area-weighted Chamfer distance is

LCh(∪Pi ,M) =∑i∑(s,t )∈U□

miny∈M d(P(s, t),y)|Ji (s, t)|∑i∑(s,t )∈U□

|Ji (s, t)|

+1|M |

∑x ∈M

miny∈∪Pi

d(x ,y). (16)

We use symbolic evaluation software to compute the expression forJi (u,v) for Coons patch i given its control points in closed-form;this formula is computed once and compiled into our code.

3.2.2 Normal alignment. While the Chamfer distance loss termencourages our predicted patches to be close to the ground-truthmesh with respect to Euclidean distance, it contains no explicitnotion of curvature or normal alignment. This results in surfaceswhose curvature differs significantly from that of the ground truthmodels (see §4.4, Figure 20 (a)). To address this, we add an additionalnormal alignment loss term.

This loss term is computed analogously to Chdir(∪Pi ,M), exceptthat instead of Euclidean distance, we compute normal distance,defined as

dN (x ,y) = ∥nx − ny ∥22 , (17)

where nx is the normal vector at point x . For each point y sampledfrom our predicted surface, we compare ny to nx , where x ∈ M isclosest to y under Euclidean distance, and, symmetrically, for eachx ′ ∈ M , we compare nx ′ to to ny′ , where y′ ∈ ∪Pi is closest to x ′.We precompute the normal vectors for all points sampled from ourtarget meshes, and we again use symbolic differentiation to computethe expression for the normal vector of a Coons patch at P(u,v).

In analogy to the variational Chamfer loss above, we have

Lnormal(∪Pi ,M) =∑i∑(u,v)∈U□

dN (NN(Pi (u,v),M), Pi (u,v)) |Ji (u,v)|∑i∑(u,v)∈U□

|Ji (u,v)|

+1|M |

∑x ∈M

dN (x ,NN(x ,∪Pi )) , (18)

where NN(x ,Y ) is the nearest neighbor to x in Y under Euclideandistance.

3.2.3 Intersection regularization. We introduce a collision detectionloss to detect pairwise patch intersections. We define this loss as

Lcoll({Pi }) =∑i,j

exp(−(min(d(T i , P j ), d(T j , P i ))/ε

)2), (19)

where T i is a triangulation of patch Pi , and P i is a set of pointssampled from patch Pi . We triangulate a patch during training bytaking the image of a fixed triangulation of a regular grid in patchparameter space. With a small ε (ε = 10−6 in our experiments), thisexpression smoothly interpolates between near-zero when the twopatches do not intersect and one if the patches are intersecting, upto resolution of the grid used to compute the triangulation. For apair of adjacent patches or those that share a junction, we truncate

Page 7: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 7

one patch by one grid row at the adjacency before evaluating thecollision loss.

3.2.4 Patch flatness regularization. The loss functions defined aboveensure that our output is a manifold surface that matches the targetgeometry. However, we also prefer that our Coons patches align tosmooth regions of the geometry and that sharp creases fall on patchboundaries. To this end, we define a patch flatness regularizer thatfavors flat Coons patches, discouraging excessively high curvature.

The patch flatnesss regularizer encourages each Coons patch mapP : [0, 1] × [0, 1] → R3 to be close to a linear map. For each Coonspatch, we sample random points U□ in parameter space, computetheir image P(U□) and fit a linear function using linear least-squares.Thus, we have P̂(U□) = AU□ + b ≈ P(U□) for some A,b. We definepatch flatness loss as

Lflat({Pi }) =∑i∑(u,v)∈U□

∥P̂i (u,v) − Pi (u,v)∥22 |Ji (u,v)|∑i∑(u,v)∈U□

|Ji (u,v)|. (20)

3.2.5 Template normals regularization. For shape categories wherea category-specific template is available, we not only utilize thetemplate geometry to initialize the network output but also regu-larize the output geometry using normals defined by the template.Along with the patch flatness regularizer, this encourages favorablepositioning of patch seams and prevents patches from unnecessarilysliding over high-curvature regions.We define template normals loss as

Ltemplate({Pi }, {Ti }) =∑i∑(u,v)∈U□

∥nPi (u,v) − nTi ∥22 |Ji (u,v)|∑i∑(u,v)∈U□

|Ji (u,v)|,

(21)where nTi is the normal vector of the ith template patch; sincetemplate patches are flat, the normal vector is constant for a patch.

3.2.6 Global symmetry. Man-made shapes frequently exhibit globalbilateral symmetries. Enforcing symmetry during reconstructionmay be problematic for previous shape representations, such as de-formable meshes or implicit surfaces. In contrast, our representationallows for a straightforward implementation of symmetry. Havingcomputed the symmetry planes of the initial template, we may en-force symmetric positions of the corresponding control points asan additional loss term:

Lsym(∪Pi ) =1|S |

∑(i, j)∈S

∥(P ix − a, P iy , Piz ) − (a − P

jx , P

jy , P

jz )∥22 (22)

where S contains pairs of indexes of symmetric control points andP i = (P ix , P iy , P iz ) is the ith control point. Here, we define symmetryloss for symmetry plane x = a, but the definition for other axes ofsymmetry is analogous.

We employ this principle to enforce symmetrical reconstructionof airplanes and cars.

128x128 image

ResNet-18 fullyconnected

patchflatness

loss

patchflatness

loss

collisionloss

collisionloss

templatenormals

loss

templatenormals

loss

normalalignment

loss

normalalignment

loss

Chamferloss

Chamferloss

Coonspatches

template

ground truth

symmetryloss

symmetryloss

Fig. 7. An overview of our deep learning pipeline. We encode an image andget back a series of parameters defining a collection of Coons patches. Wethen compute six loss values based on the predicted patches and the groundtruth 3D model as well as a template.

3.3 Deep learning pipelineThe final loss that we optimize is

L({Pi },M) = LCh(∪Pi M) + αnormalLnormal(∪Pi M)+ αflatLflat({Pi }) + αcollLcoll({Pi })+ αtemplateLtemplate({Pi }, {Ti }) + αsymLsym(∪Pi ).

(23)

For models scaled to fit in a unit sphere, we use αnormal = 0.008,αflat = 2, and αcoll = 0.00001 for all experiments, and αtemplate =0.0001 and αsym = 1 for experiments that use those regularizers.

Our network takes as input one or more 128 × 128 raster imagesand outputs parameters defining the predicted Coons patches. Weuse an encoder-decoder architecture, consisting of a ResNet-18 Heet al. [2016] followed by three fully-connected hidden layers with1024, 512, and 256 units, respectively, and an output layer withsize equal to the appropriate output dimension. We initialize theweights of the final layer to zero with bias equal to the parametersfor the template, therefore setting the starting geometry to that ofthe template. To accept multi-view input for the tests in §4.2, weencode each input image using the ResNet encoder and performmax pooling over the latent codes. We use ReLU nonlinearity andbatch normalization after each layer except for the last. We traineach network on a single Tesla V100 GPU, using Adam [Kingmaand Ba 2014] and batch size 8 with learning rate 0.00001 when usinggeneric sphere templates and 0.0001 for category-specific templates.We train all categories for 24 hours. At each iteration, we sample7,000 points from the predicted and target shapes. Additionally, weperform train-time data augmentation by applying random crops, ro-tations, and horizontal flips to the input images. Our entire pipelineis illustrated in Figure 7.

4 EXPERIMENTAL RESULTSWe demonstrate the efficacy of our method by applying it to the taskof sketch-based modeling. We introduce a synthetic data generationpipeline for automatically creating realistic sketch data from 3Dmodels. We then use our pipeline to train a network that takes anatural sketch image and converts to a patch-based 3D representa-tion. We show 3D reconstruction results both on synthetic sketches

Page 8: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

8 • Smirnov, Bessmeltsev, and Solomon

(a) (b) (c) (d)

Fig. 8. Our data generation and augmentation pipeline. Starting with a 3Dmodel (a), we use Arnold Renderer in AutodeskMaya to generate its contours(b), which we vectorize using themethod of Bessmeltsev and Solomon [2019]and stochastically modify (c). We then use the pencil drawing generationmodel of Simo-Serra et al. [2018] to generate the final image (d).

from our dataset as well as natural human-drawn sketches and alsoperform an ablation study, demonstrating the necessity of each termin our objective function. Finally, we compare our results to existingmethods for sketch-based as well as single-view 3D reconstruction.

4.1 Data PreparationWhile there exist annotated datasets of 3D models and correspond-ing hand-drawn sketches [Gryaditskaya et al. 2019], such data areunavailable at the scale necessary for deep learning. Thus, we in-stead generate synthetic training data from 3D models. Our systemcreates sketch-like images that capture a model from several viewsand contain the typical ambiguities and inaccuracies present inhuman-drawn sketches.

Our first step is to generate 2D contours from the 3Dmodel, whichan artist would capture in a sketch. Guided by the study by Coleet al. [2012], we render occluding contours and sharp edges usingthe Arnold Toon Shader in Autodesk Maya. We render each modelfrom a fixed number of distinct camera angles manually chosen pershape category to best capture representative views.Although the contour images capture the main features of the

3D model, they lack some of the ambiguities present in rough hand-drawn sketches [Liu et al. 2018], so we augment our contour imageswith features such as broken lines. To this end, we first vectorizethe contour images using the method of Bessmeltsev and Solomon[2019]. Then, for each vectorized image, we augment the set ofcontours. With a probability of 0.3, we split a random stroke intotwo at a uniformly random position. We do this no more than 10times for a single image. Additionally, for each stroke, we truncateit at its endpoints with probability of 0.2. Finally, we introduce arealistic sketch-like texture to our contours while also adding noiseand ambiguity. For each augmented vectorized contour image, werasterize it using several different stroke widths. We then pass therasterized images through the pencil drawing generation model ofSimo-Serra et al. [2018]. We illustrate our entire data generationand augmentation pipeline in Figure 8.In the end, for each 3D model, we obtain a series of realistic,

synthetically-generated sketch images. In our experiments, we trainmodels from the airplane, bathtub, guitar, bottle, car, mug, gun, andknife categories of the ShapeNet Core (v2) dataset [Chang et al.2015]. We choose these categories because they largely contain

Fig. 9. Results on synthetic sketches of airplanes. From left to right: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (76 patches). For more results, please refer to Fig. 1

models with consistent structure, making them well-suited for ourrepresentation. Prior to processing, we convert the ShapeNet modelsto watertight meshes using the method of Huang et al. [2018] andnormalize them to fit into an origin-centered unit sphere. We alsomanually remove some mislabeled models from the dataset.

4.2 Results on Real and Synthetic SketchesWe pick a random 10%-90% test-train split for each shape categoryand evaluate our method on synthetic sketches from our test datasetin Figures 1, 10, 11, 12, 13, 14, 15, and 9. For each category, we showresults from both a model using a generic 54-patch sphere templateand a category-specific template. The templates for airplanes, gui-tars, guns, knives, and cars are generated fully automatically usingsemantic segmentations of Yi et al. [2016]. For mugs, we start withan automatically-generated template and manually add a hole inthe handle as well as a void in the mug interior. To demonstrate ourmethod using a template consisting of multiple distinct parts, forcars, we use the segmentation during training, computing Chamferand normal alignment losses for wheel and body patches separately.Finally, to demonstrate the use case of our system when segmenta-tions are not available, we manually construct the bottle and bathtubtemplates simply by placing two and and five cuboids, respectively,and then running our template processing algorithm.For a generic sphere template, our method produces a compact

piecewise-smooth representation of surfaces of comparable qualityto the more conventional deformable meshes. Our algorithmic con-struction of category-specific templates, however, enables a higher-quality reconstruction of sharp features and details.

In Figure 17, we show how our system is able to utilize multipleviews of the same object in order to refine its prediction.

Page 9: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 9

Fig. 10. Results on synthetic sketches of bottles. From top to bottom: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (14 patches).

Fig. 11. Results on synthetic sketches of bathtubs. From left to right: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (14 patches).

We also test our method on real sketches drawn by four artistsusing pencil and paper as well as an iPad with an Apple Pencil(Figure 18). Each artist was shown a rendering of a sample 3D modelrendered from each of our viewpoints and was told to sketch anobject in the same category from one of the viewpoints. The artistswere never shown the contours or synthetic sketches used in our

Fig. 12. Results on synthetic sketches of guitars. From top to bottom: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (22 patches).

Fig. 13. Results on synthetic sketches of guns. From left to right: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (20 patches).

training procedure. The 3D results that we recover are similar tothose on the synthetic sketches. This demonstrates that our datasetis reflective of the choices that humans make when sketching 3Dobjects.

Page 10: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

10 • Smirnov, Bessmeltsev, and Solomon

Fig. 14. Results on synthetic sketches of knives. From top to bottom: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (14 patches).

Fig. 15. Results on synthetic sketches of cars. From left to right: input sketch,3D model with sphere template (54 patches), 3D model with category-specific template (43 patches).

4.3 3D Model InterpolationThe representation learned by our method is naturally well-suitedfor interpolating between 3D models. Because each model is com-posed of a small number of patches, each of which is placed con-sistently across different models, we can linearly interpolate theparameters that define the patches (e.g., the vertex positions) togenerate models “between" those output by our network.We are also able to perform interpolation in the latent space

learned by our deep model; we take the output of our our first 1024-dimensional hidden fully-connected layer to be the latent space.

Fig. 16. Results on synthetic sketches of mugs. From top to bottom: inputsketch, 3Dmodel with sphere template (54 patches), 3Dmodel with category-specific template (32 patches).

Fig. 17. We demonstrate our method’s ability to incorporate details fromdifferent views of a model into its final prediction. We show our outputwhen given a single view of an airplane as well as the output when givenan additional view. The combined model incorporates elements not visiblein the original view.

Fig. 18. Results on real human-drawn sketches of airplanes.

While the resulting interpolation is similar to that in patch space,each interpolant better resembles a realistic model due to the priorslearned by our network.

We demonstrate both patch-space and latent-space interpolationbetween two car models in Figure 19.

4.4 Ablation StudyWe perform an ablation study of our method. We demonstrate onan airplane model the effect of training without each term in ourloss function as well as the difference between a category-specifictemplate, a 54-patch sphere template, and a lower resolution 24-patch template. The results are shown in Figure 20.The ablation study demonstrates the contribution of each com-

ponent of our system method to the final result. Training withoutcollision detection loss results in predictions containing pairwise or

Page 11: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 11

Fig. 19. Linear interpolation in learned latent space (above) and patchparameter space (below) between two car models. The consistent patchplacement and the low-dimensional, geometrically meaningful nature of ourrepresentation make it possible to interpolate directly in patch parameterspace. We obtain even better interpolants, however, when interpolating inthe 1024-dimensional latent space learned by our model; each model in thelatent space interpolation appears to be a valid car.

(a) (b) (c) (d)

(e) (g) (h) (i)

Fig. 20. An ablation study of our algorithm, training the network (a) withoutthe normal alignment loss, (b) without collision detection loss, (c) withoutpatch flatness loss, (d) without template normal loss (e), without symmetryloss (f), as well as using 24-patch (g) and 54-patch (h) sphere templatescompared to the final result (i).

self-intersections. Omitting the normal loss causes the 3D surfaceto suffer in smoothness. Patch flatness and and template normallosses encourage patch seams to align to sharp features. While bothsphere templates capture the geometry, using more patches allowsto capture greater details, and using a non-generic template furtherimproves the model.

4.5 ComparisonsIn Figure 21, we compare our method to the sketch-based 3D re-construction methods of Lun et al. [2017] and Delanoy et al. [2018].Our comparisons are generated using the species of input used totrain these two methods, rather than attempting to re-train theirmodels for our input.Although we train on a different dataset, the visual quality and

fidelity of our predictions is comparable to the output of [Lun et al.2017] and [Delanoy et al. 2018]. Moreover, our method offers somedistinct advantages. In particular, we output a 3D representationthat sparsely captures smooth and sharp features, independent ofresolution. In contrast, Delanoy et al. [2018] produce a 643 voxelgrid—a dense representation at a fixed resolution, which cannot beedited directly and offers no topological guarantees. In Figure 22,we show results of their system evaluated on contours from ourdataset. These inputs were not processed with the pencil sketchmodel, to more closely resemble the data used to train their system.We show their results (orange) on two inputs alongside our results(blue). These results largely demonstrate that our task of reconstruct-ing sketches with a prior on class (airplane) rather than geometricstructure (cylinders and cuboids) is misaligned with theirs: Sinceour training data is not well-approximated by CSG models, theirmethod is unable to extract meaningful output.

(b)

(d)

(b)

(d)

(a)

(c)

(a)(a)

(c)

Fig. 21. Compared to the previous approaches, [Delanoy et al. 2018] (a) and[Lun et al. 2017] (c), our model (b and d) captures qualitative aspects of theinput images despite having been trained on data generated from different3D models and rendered using a distinct pipeline. See Figure 22 for modelsproduced by the method of Delanoy et al. [2018] on our data. Furthermore,unlike voxel-based [Delanoy et al. 2018] or smooth mesh-based [Lun et al.2017] approaches, our models do not depend on resolution and can representsharp and smooth regions explicitly.

Fig. 22. Comparison to [Delanoy et al. 2018] on inputs from our dataset.Their predictions (generated by the authors) are in orange, and ours are inblue. This experiment demonstrates that their method does not generalizeto arbitrary single-view sketches.

Although the method of Lun et al. [2017] ultimately produces amesh, it is only after a computationally expensive post-processingand fine-tuning procedure, since a forward pass through their net-work returns a labeled point cloud from which the mesh is extracted.Our method directly outputs the parameters for surface patches withno further optimization or post-processing. Additionally, the finalmesh from their technique contains more components (triangles)than our output representation (patches), making it less useful forediting. Finally, their fine-tuning approach is fundamentally incom-patible with the goal of parsing human-drawn sketches, since theyrely on propagating changes to the 3D mesh back to the raster im-age. The inherent ambiguity and noise of our input precludes thisprocedure.

In Figure 23, we compare our method to AtlasNet [Groueix et al.2018]. Since AtlasNet does not operate on sketch-based input, weretrain our model with the renderings used for AtlasNet. We usethe generic 54-face sphere template for fair comparison. While our3D reconstructions capture the same amount of detail, they do notsuffer from the topological defects of AtlasNet’s representation. Inparticular, AtlasNet’s reconstruction contains many patch intersec-tions as well as holes in the surface. Extracting a watertight meshwould require significant post-processing. Additionally, each patchin our representation is parameterized sparsely by control points onits boundary. This is in contrast to AtlasNet’s patches, which come

Page 12: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

12 • Smirnov, Bessmeltsev, and Solomon

(b) (c)(a)

Fig. 23. 3D reconstructions using AtlasNet [Groueix et al. 2018] (b) andour method (c) given a single rendering as input (a). Compared to Atlas-Net, we produce a result without topological defects (holes and overlaps).Additionally, each of our patch primitives is easily editable and has a lowdimensional, interpretable parameterization.

from a learned latent space and, therefore, must be sampled using adeep decoder network and cannot be easily edited.

In Pixel2Mesh. Wang et al. [2018b] output a triangle mesh givena rendering as input. We train models using our method for the carcategory, using the same renders and an identical test-train split asPixel2Mesh. For fair comparison, we use our generic 54-face spheretemplate; Pixel2Mesh also initializes its output with a sphere mesh.While the final output of Pixel2Mesh is a mesh containing 2466vertices, which corresponds to 7398 degrees of freedom, we out-put 54 patches, corresponding to 816 degrees of freedom, makingour representation better suited for editability and interpretability(Figure 2.) As shown in Figure 24, the low dimensionality of our 3Dmodels is not at the expense of expressiveness.We compare to Pixel2Mesh quantitatively in Table 1. We select

2500 random test set views and compute Chamfer distance using5000 sampled points. We rescale the Pixel2Mesh meshes to be thesame size as our meshes for the comparison. While we are able toobtain comparable Chamfer distance values, our representation issignificantly more compact, editable, and less prone to non-manifoldartifacts.

Category CD DOFP2M ours P2M ours

airplane 0.022 0.025 7398 816car 0.018 0.022 7398 816

Table 1. Quantitative comparison to Pixel2Mesh [Wang et al. 2018b]. Forthe airplane and car ShapeNet categories, we report Chamfer distance (CD)and degrees of freedom in the representation (DOF). Although we obtaincomparable Chamfer distance, we do so using a representation that is anorder of magnitude more compact and without non-manifold artifacts.

5 DISCUSSION AND CONCLUSIONAs more and more 3D data becomes readily available, the need forprocessing, modifying, and generating 3Dmodels in a usable fashionalso increases.While the quality of results produced by deep learningsystems continues to improve, it is necessary to think carefullyabout their format, particularly with respect to existing applicationsand use cases. By carefully designing representations together withcompatible learning algorithms, we can truly harness all these datafor the purpose of simplifying and automating workflows in design,modeling, and manufacturing.While many difficult problems remain on the path toward this

goal, our system represents a significant step toward practical 3D

Pixel2Mesh OursInput Pixel2Mesh OursInput

Fig. 24. Comparison to Pixel2Mesh [Wang et al. 2018b] on four test setimages from each of the car and airplane categories. From left to rightin each column: input image, Wang et al. [2018b], ours trained on thegeneric 54-patch sphere template. While we are able to capture a similardegree of geometric detail in our 3D models, the dimensionality of ourpatch-based representation is an order of magnitude smaller than the mesh-based representation of Pixel2Mesh, and our results do not suffer fromnon-manifold artifacts.

modeling assisted by deep learning. Our use of a sparse patch-basedrepresentation is closer to what is used in artistic and engineeringpractice, and we accompany this representation with new geometricregularizers that greatly improve the reconstruction process. Unlikemeshes or voxel occupancy functions, this representation can easilybe edited and tuned after 3D reconstruction, and it captures a trade-off between smoothness and sharp edges reasonable for man-madeshapes. Furthermore, our synthetic sketch data generation pipelinefills a gap in data sets needed to train modern machine learningsystems for this task.

Our work suggests several avenues for future research. Currentlyour technique uses pre-trained networks to generate sketch trainingdata; inspired by recent generative adversarial networks (GAN), wecould couple together training of these different pieces to allevi-ate dependence on matched sketch–3D model pairs. We also couldexplore coupling with other representations, leveraging the richliterature in computer-aided geometric design (CAGD) to identifyother structures amenable to learning with relatively few parame-ters. Of particular interest are multiresolution representations (e.g.,subdivision surfaces), which might enable the system to learn bothhigh-level smooth structure as well as geometric details like filigreeindependently. It also may be beneficial to incorporate additionalmodalities such as photographs to further regularize our learnedoutput.

Other extensions of our work might be oriented toward the enduser. Capturing and learning from the sequence of strokes might befruitful for disambiguating depth information in 3D reconstruction.Furthermore, we should close the loop between learning system andartist, allowing the artist to edit the 3D model or to edit the sketchand have the changes propagate to the other side.

Perhaps the most important challenge remaining from our work—and others, such as [Kanazawa et al. 2018; Smirnov et al. 2020; Wanget al. 2019]—involves inference of the topology of a shape. Whileour current per-class templates support structural variability andmodular parts, scaling this towards a completely learned topologyis nontrivial. Although this limitation is reasonable for the classes

Page 13: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

Learning Manifold Patch-Based Representations of Man-Made Shapes • 13

of shapes we consider—and likely for parts of shapes, as explored in[Mo et al. 2019]—reconstruction of a sketch of a generic full shapewill require algorithms that automatically add and connect patchesin a flexible and adaptive fashion.Even without the improvements above, our system remains an

effective means of 3D shape recovery. It can be used as-is as a meansof extracting an initial 3D model that can be tuned by an artistor engineer. Moreover, our architecture and loss functions can beincorporated as building blocks into larger pipelines connectingartistic imagery to the 3D world.

6 ACKNOWLEDGEMENTSWe acknowledge the generous support of Army Research Officegrant W911NF-12-R-0011, of National Science Foundation grantIIS-1838071, from an Amazon Research Award, from the MIT-IBMWatson AI Laboratory, from the Toyota-CSAIL Joint Research Cen-ter, from the Skoltech-MIT Next Generation Program, and of a giftfrom Adobe Systems. This work was also supported by the NationalScience Foundation Graduate Research Fellowship under Grant No.1122374. We acknowledge the support of the Natural Sciences andEngineering Research Council of Canada (NSERC) grant RGPIN-2019-05097 (“Creating Virtual Shapes via Intuitive Input") and fromthe Fonds de recherche du Québec - Nature et technologies (FRQNT)grant 2020-NC-270087.

REFERENCESPanos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, and Leonidas J Guibas. 2017.

Learning Representations and Generative Models For 3D Point Clouds. arXivpreprint arXiv:1707.02392 (2017).

Timur Bagautdinov, Chenglei Wu, Jason Saragih, Pascal Fua, and Yaser Sheikh. 2018.Modeling Facial Geometry Using Compositional VAEs. In The IEEE Conference onComputer Vision and Pattern Recognition (CVPR).

Pierre Baque, Edoardo Remelli, François Fleuret, and Pascal Fua. 2018. Geodesic convo-lutional shape optimization. arXiv preprint arXiv:1802.04016 (2018).

Heli Ben-Hamu, Haggai Maron, Itay Kezurer, Gal Avineri, and Yaron Lipman. 2018.Multi-chart Generative Surface Modeling. ACM Trans. Graph. 37, 6, Article 215 (Dec.2018), 15 pages. https://doi.org/10.1145/3272127.3275052

Mikhail Bessmeltsev, Will Chang, Nicholas Vining, Alla Sheffer, and Karan Singh. 2015.Modeling Character Canvases from Cartoon Drawings. ACM Trans. Graph. 34, 5,Article 162 (Nov. 2015), 16 pages. https://doi.org/10.1145/2801134

Mikhail Bessmeltsev and Justin Solomon. 2019. Vectorizing Line Drawings via Polyvec-tor Fields. ACM Transactions on Graphics 38, 1 (2019).

Mikhail Bessmeltsev, Nicholas Vining, and Alla Sheffer. 2016. Gesture3D: Posing 3DCharacters via Gesture Drawings. ACM Transactions on Graphics 35, 6 (2016),165:1–165:13. https://doi.org/10.1145/2980179.2980240

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang,Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi,and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. TechnicalReport arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University —Toyota Technological Institute at Chicago.

Tao Chen, Zhe Zhu, Ariel Shamir, Shi-Min Hu, and Daniel Cohen-Or. 2013. 3-Sweep.ACM Transactions on Graphics 32, 6 (2013), 1–10. https://doi.org/10.1145/2508363.2508378

Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shapemodeling. In Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 5939–5948.

Joseph Jacob Cherlin, Faramarz Samavati, Mario Costa Sousa, and Joaquim A. Jorge.2005. Sketch-based modeling with few strokes. Proceedings of the 21st springconference on Computer graphics - SCCG ’05 1, 212 (2005), 137. https://doi.org/10.1145/1090122.1090145

Christopher B. Choy, Danfei Xu, Jun Young Gwak, Kevin Chen, and Silvio Savarese. 2016.3D-R2N2: A unified approach for single and multi-view 3D object reconstruction.Lecture Notes in Computer Science (including subseries Lecture Notes in ArtificialIntelligence and Lecture Notes in Bioinformatics) 9912 LNCS (2016), 628–644. https://doi.org/10.1007/978-3-319-46484-8_38

Forrester Cole, Aleksey Golovinskiy, Alex Limpaecher, Heather Stoddart Barros, AdamFinkelstein, Thomas Funkhouser, and Szymon Rusinkiewicz. 2012. Where do people

draw lines? Commun. ACM 55, 1 (2012), 107. https://doi.org/10.1145/2063176.2063202S. A. Coons. 1967. Surfaces for Computer-Aided Design of Space Forms. Technical Report.

Cambridge, MA, USA.Johanna Delanoy, Mathieu Aubry, Phillip Isola, Alexei A Efros, and Adrien Bousseau.

2018. 3d sketching using multi-view deep volumetric prediction. Proceedings of theACM on Computer Graphics and Interactive Techniques 1, 1 (2018), 1–22.

Chao Ding and Ligang Liu. 2016. A Survey of Sketch Based Modeling Systems. Front.Comput. Sci. 10, 6 (Dec. 2016), 985–999. https://doi.org/10.1007/s11704-016-5422-9

Even Entem, Loic Barthe, Marie-Paule Cani, Frederic Cordier, and Michiel van de Panne.2015. Modeling 3D Animals from a Side-view Sketch. Comput. Graph. 46, C (Feb.2015), 221–230. https://doi.org/10.1016/j.cag.2014.09.037

Haoqiang Fan, Hao Su, and Leonidas Guibas. 2017. DeepPointSet : A Point Set Genera-tion Network for 3D Object Reconstruction from a Single Image. In CVPR.

Gerald Farin. 2002. Curves and Surfaces for CAGD: A Practical Guide (5th ed.). MorganKaufmann Publishers Inc., San Francisco, CA, USA.

Jun Gao, Chengcheng Tang, Vignesh Ganapathi-Subramanian, Jiahui Huang, Hao Su,and Leonidas J. Guibas. 2019. DeepSpline: Data-Driven Reconstruction of ParametricCurves and Surfaces. (2019), 1–13. arXiv:1901.03781 http://arxiv.org/abs/1901.03781

Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, andThomas Funkhouser. 2019. Learning shape templates with structured implicitfunctions. In Proceedings of the IEEE International Conference on Computer Vision.7154–7164.

Yotam Gingold, Takeo Igarashi, and Denis Zorin. 2009. Structured annotations for2D-to-3D modeling. ACM Transactions on Graphics 28, 5 (2009), 1. https://doi.org/10.1145/1618452.1618494

Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and MathieuAubry. 2018. AtlasNet: A Papier-MacheApproach to Learning 3D Surface Generation.Proceedings of the IEEE Computer Society Conference on Computer Vision and PatternRecognition (2018), 216–224. https://doi.org/10.1109/CVPR.2018.00030

Yulia Gryaditskaya, Mark Sypesteyn, Jan Willem Hoftijzer, Sylvia Pont, Frédo Durand,and Adrien Bousseau. 2019. OpenSketch: A Richly-Annotated Dataset of ProductDesign Sketches. ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 38 (11 2019).

Niv Haim, Nimrod Segol, Heli Ben-Hamu, Haggai Maron, and Yaron Lipman. 2019. Sur-face Networks via General Covers. In Proceedings of the IEEE International Conferenceon Computer Vision. 632–641.

Christian Häne, Shubham Tulsiani, and Jitendra Malik. 2019. Hierarchical SurfacePrediction. TPAMI (2019).

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learningfor image recognition. In Proceedings of the IEEE conference on computer vision andpattern recognition. 770–778.

H. Huang, E. Kalogerakis, E. Yumer, and R. Mech. 2017. Shape Synthesis fromSketches via Procedural Models and Convolutional Networks. IEEE Transac-tions on Visualization and Computer Graphics 23, 8 (Aug 2017), 2003–2013. https://doi.org/10.1109/TVCG.2016.2597830

Jingwei Huang, Hao Su, and Leonidas Guibas. 2018. RobustWatertight Manifold SurfaceGeneration Method for ShapeNet Models. arXiv preprint arXiv:1802.01698 (2018).

Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. 1999. Teddy: A SketchingInterface for 3D Freeform Design. In Proceedings of the 26th Annual Conference onComputer Graphics and Interactive Techniques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 409–416. https://doi.org/10.1145/311535.311602

Anil K. Jain, Yu Zhong, and Marie-Pierre Dubuisson-Jolly. 1998. Deformable TemplateModels: A Review. Signal Process. 71, 2 (Dec. 1998), 109–129. https://doi.org/10.1016/S0165-1684(98)00139-X

A Jung, S Hahmann, D Rohmer, A Begault, L Boissieux, and M P Cani. 2015. SketchingFolds: Developable Surfaces from Non-Planar Silhouettes. Acm Transactions onGraphics 34, 5 (2015), 12. https://doi.org/10.1145/2749458

Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Learn-ing Category-Specific Mesh Reconstruction from Image Collections. In ECCV.

Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3D Mesh Renderer.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980 (2014).

Venkat Krishnamurthy andMarc Levoy. 1996. Fitting Smooth Surfaces to Dense PolygonMeshes. In Proceedings of the 23rd Annual Conference on Computer Graphics andInteractive Techniques (SIGGRAPH âĂŹ96). Association for Computing Machinery,New York, NY, USA, 313âĂŞ324. https://doi.org/10.1145/237170.237270

Changjian Li, Hao Pan, Yang Liu, Xin Tong, Alla Sheffer, and Wenping Wang. 2018.Robust Flow-Guided Neural Prediction for Sketch-Based Freeform Surface Modeling.ACM Trans. Graph. 37, 6, Article Article 238 (Dec. 2018), 12 pages. https://doi.org/10.1145/3272127.3275051

Changjian Li, Hao Pan, Xin Tong, Alla Sheffer, and Wenping Wang. 2017. BendSketch :Modeling Freeform Surfaces Through 2D Sketching. ACM Trans. Graph 36, 4 (2017).

Minchen Li, Alla Sheffer, Eitan Grinspun, and Nicholas Vining. 2018. Foldsketch:enriching garments with physically reproducible folds. {ACM} Trans. Graph. 37, 4(2018), 133:1—-133:13.

Page 14: Deep Sketch-Based Modeling of Man-Made Shapes · data, implicitly learning the shape priors [Delanoy et al. 2018; Lun et al. 2017; Wang et al. 2018a], but their output models often

14 • Smirnov, Bessmeltsev, and Solomon

Or Litany, Alex Bronstein, Michael Bronstein, and Ameesh Makadia. 2018. DeformableShape Completion with Graph Convolutional Autoencoders. CVPR (2018).

Chenxi Liu, Enrique Rosales, and Alla Sheffer. 2018. StrokeAggregator: ConsolidatingRaw Sketches into Artist-Intended Curve Drawings. ACM Transaction on Graphics37, 4 (2018). https://doi.org/10.1145/3197517.3201314

M. Liu, O. Tuzel, A. Veeraraghavan, and R. Chellappa. 2010. Fast directional chamfermatching. In 2010 IEEE Computer Society Conference on Computer Vision and PatternRecognition. 1696–1703. https://doi.org/10.1109/CVPR.2010.5539837

Zhaoliang Lun, Matheus Gadelha, Evangelos Kalogerakis, Subhransu Maji, and RuiWang. 2017. 3D Shape Reconstruction from Sketches via Multi-view ConvolutionalNetworks. In 2017 International Conference on 3D Vision (3DV).

Priyanka Mandikal, K L Navaneet, Mayank Agarwal, and R Venkatesh Babu. 2018.3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point CloudReconstruction from a Single Image. In Proceedings of the British Machine VisionConference (BMVC).

Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer,Vladimir G. Kim, and Yaron Lipman. 2017. Convolutional Neural Networks onSurfaces via Seamless Toric Covers. ACM Trans. Graph. 36, 4, Article 71 (July 2017),10 pages. https://doi.org/10.1145/3072959.3073616

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and AndreasGeiger. 2019. Occupancy Networks: Learning 3D Reconstruction in Function Space.In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).

KaichunMo, Shilin Zhu, Angel Chang, Li Yi, Subarna Tripathi, Leonidas Guibas, andHaoSu. 2019. PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR).

Andrew Nealen, Takeo Igarashi, Olga Sorkine, and Marc Alexa. 2007. FiberMesh:Designing Freeform Surfaces with 3D Curves. ACM Transactions on Graphics(Proceedings of ACM SIGGRAPH) 26, 3 (2007), article no. 41.

Gen Nishida, Ignacio Garcia-Dorado, Daniel G. Aliaga, Bedrich Benes, and AdrienBousseau. 2016. Interactive Sketching of Urban Procedural Models. ACM Trans.Graph. 35, 4 (2016).

Luke Olsen, Faramarz F. Samavati, Mario Costa Sousa, and Joaquim A. Jorge. 2009.Sketch-based modeling: A survey. Computers & Graphics 33, 1 (feb 2009), 85–103.https://doi.org/10.1016/j.cag.2008.09.013

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Love-grove. 2019. DeepSDF: Learning Continuous Signed Distance Functions for ShapeRepresentation. In The IEEE Conference on Computer Vision and Pattern Recognition(CVPR).

Les Piegl and Wayne Tiller. 1996. The NURBS Book (second ed.). Springer-Verlag, NewYork, NY, USA.

Danilo Jimenez Rezende, SM Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jader-berg, and Nicolas Heess. 2016. Unsupervised learning of 3d structure from images.In Advances in neural information processing systems. 4996–5004.

C. Robson, R. Maharik, A. Sheffer, and N. Carr. 2011. Context-Aware Garment Modelingfrom Sketches. Computers and Graphics (Proc. SMI 2011) (2011), 604–613.

Tianjia Shao, Dongping Li, Yuliang Rong, Changxi Zheng, and Kun Zhou. 2016. DynamicFurniture Modeling Through Assembly Instructions. ACM Transactions on Graphics(SIGGRAPH Asia 2016) 35, 6 (2016).

A. Shtof, A. Agathos, Y. Gingold, A. Shamir, and D. Cohen-Or. 2013. GeosemanticSnapping for Sketch-Based Modeling. Computer Graphics Forum 32, 2, pt. 2 (may2013), 245–253. https://doi.org/10.1111/cgf.12044

Edgar Simo-Serra, Satoshi Iizuka, and Hiroshi Ishikawa. 2018. Mastering Sketching:Adversarial Augmentation for Structured Prediction. Transactions on Graphics(Presented at SIGGRAPH) 37, 1 (2018).

Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep Learning 3D Shape SurfacesUsing Geometry Images. In ECCV.

Dmitriy Smirnov, Matthew Fisher, Vladimir G. Kim, Richard Zhang, and Justin Solomon.2020. Deep Parametric Shape Predictions using Distance Fields. In Conference onComputer Vision and Pattern Recognition (CVPR).

Chunyu Sun, Qianfang Zou, Xin Tong, and Yang Liu. 2019. Learning Adaptive Hierar-chical Cuboid Abstractions of 3D Shape Collections. ACM Transactions on Graphics(SIGGRAPH Asia) 38, 6 (2019).

Chiew-Lan Tai, Hongxin Zhang, and Jacky Chun-Kin Fong. 2004. Prototype Modelingfrom Sketched Silhouettes based on Convolution Surfaces. Computer Graphics Forum(2004). https://doi.org/10.1111/j.1467-8659.2004.00006.x

M. Tatarchenko, A. Dosovitskiy, and T. Brox. 2016. Multi-view 3D Models from SingleImages with a Convolutional Network. In European Conference on Computer Vision(ECCV).

Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B.Tenenbaum, and Jiajun Wu. 2019. Learning to Infer and Execute 3D Shape Programs.In International Conference on Learning Representations.

Shubham Tulsiani, Alexei A Efros, and Jitendra Malik. 2018. Multi-view consistencyas supervisory signal for learning shape and pose prediction. In Proceedings of theIEEE conference on computer vision and pattern recognition. 2897–2905.

ShubhamTulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, and JitendraMalik. 2017a.Learning Shape Abstractions by Assembling Volumetric Primitives. In ComputerVision and Pattern Regognition (CVPR).

ShubhamTulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, and JitendraMalik. 2017b.Learning Shape Abstractions by Assembling Volumetric Primitives. In ComputerVision and Pattern Regognition (CVPR).

Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017c. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency.(2017). https://doi.org/10.1109/CVPR.2017.30 arXiv:1704.06254

Emmanuel Turquin, Marie-Paule Cani, and John F. Hughes. 2004. Sketching Garmentsfor Virtual Characters. In Proceedings of the First Eurographics Conference on Sketch-Based Interfaces and Modeling (SBM’04). Eurographics Association, Aire-la-Ville,Switzerland, Switzerland, 175–182. https://doi.org/10.2312/SBM/SBM04/175-182

Lingjing Wang, Jifei Wang, Cheng Qian, and Yi Fang. Unsupervised learning of 3Dmodel reconstruction from hand-drawn sketches. In MM 2018 - Proceedings of the2018 ACMMultimedia Conference (MM 2018 - Proceedings of the 2018 ACMMultimediaConference). Association for Computing Machinery, Inc, 1820–1828. https://doi.org/10.1145/3240508.3240699 26th ACM Multimedia conference, MM 2018 ; Conferencedate: 22-10-2018 Through 26-10-2018.

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang.2018b. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. In ECCV.

Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis. ACMTransactions on Graphics (SIGGRAPH) 36, 4 (2017).

Shaoxiong Wang, Jiajun Wu, Xingyuan Sun, Wenzhen Yuan, William T Freeman,Joshua B Tenenbaum, and Edward H Adelson. 2018a. 3D Shape Perception fromMonocular Vision, Touch, and Shape Priors. In IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS).

Weiyue Wang, Duygu Ceylan, Radomir Mech, and Ulrich Neumann. 2019. 3DN: 3DDeformation Network. In CVPR.

Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, and DanielePanozzo. 2019. Deep geometric prior for surface reconstruction. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition. 10130–10139.

Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Freeman, and Joshua BTenenbaum. 2017. MarrNet: 3D Shape Reconstruction via 2.5D Sketches. InAdvancesIn Neural Information Processing Systems.

Jiajun Wu, Tianfan Xue, Joseph J Lim, Yuandong Tian, Joshua B Tenenbaum, AntonioTorralba, and William T Freeman. 2016a. Single Image 3D Interpreter Network. InEuropean Conference on Computer Vision (ECCV).

JiajunWu, Chengkai Zhang, Tianfan Xue,William T Freeman, and Joshua B Tenenbaum.2016b. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in Neural Information Processing Systems. 82–90.

Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong Zhang, William T Freeman,and Joshua B Tenenbaum. 2018. Learning 3D Shape Priors for Shape Completionand Reconstruction. In European Conference on Computer Vision (ECCV).

Baoxuan Xu, William Chang, Alla Sheffer, Adrien Bousseau, James McCrae, and KaranSingh. 2014. True2Form. ACM Transactions on Graphics 33, 4 (2014), 1–13. https://doi.org/10.1145/2601097.2601128

Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. 2016. Perspec-tive transformer nets: Learning single-view 3d object reconstruction without 3dsupervision. In Advances in neural information processing systems. 1696–1704.

Linjie Yang, Jianzhuang Liu, and Xiaoou Tang. 2013. Complex 3D general objectreconstruction from line drawings. Proceedings of the IEEE International Conferenceon Computer Vision (2013), 1433–1440. https://doi.org/10.1109/ICCV.2013.181

Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. 2018. FoldingNet: Point CloudAuto-Encoder via Deep Grid Deformation. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR).

Li Yi, Vladimir G. Kim, Duygu Ceylan, I-Chao Shen, Mengyan Yan, Hao Su, CewuLu, Qixing Huang, Alla Sheffer, and Leonidas Guibas. 2016. A Scalable ActiveFramework for Region Annotation in 3D Shape Collections. SIGGRAPH Asia (2016).

Mehmet Ersin Yumer and Levent Burak Kara. 2012. Surface creation on unstructuredpoint sets using neural networks. Computer-Aided Design 44, 7 (2012), 644 – 656.https://doi.org/10.1016/j.cad.2012.03.002

Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B Tenenbaum, William TFreeman, and JiajunWu. 2018. Learning to Reconstruct Shapes from Unseen Classes.In Advances in Neural Information Processing Systems (NeurIPS).

Zhirong Wu, S. Song, A. Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and J.Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In 2015IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1912–1920.https://doi.org/10.1109/CVPR.2015.7298801

L. Zhu, T. Igarashi, and J. Mitani. 2013. Soft Folding. Computer Graph-ics Forum 32, 7 (2013), 167–176. https://doi.org/10.1111/cgf.12224arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12224

Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, and Derek Hoiem. 2017. 3d-prnn: Generating shape primitives with recurrent neural networks. In Proceedingsof the IEEE International Conference on Computer Vision. 900–909.