[PDF] On the Effectiveness of Weight-Encoded Neural Implicit 3D Shapes

Abstract

A neural implicit outputs a number indicating whether the given query point in space is inside, outside, or on a surface. Many prior works have focused on _latent-encoded_ neural implicits, where a latent vector encoding of a specific shape is also fed as input. While affording latent-space interpolation, this comes at the cost of reconstruction accuracy for any _single_ shape. Training a specific network for each 3D shape, a _weight-encoded_ neural implicit may forgo the latent vector and focus reconstruction accuracy on the details of a single shape. While previously considered as an intermediary representation for 3D scanning tasks or as a toy-problem leading up to latent-encoding tasks, weight-encoded neural implicits have not yet been taken seriously as a 3D shape representation. In this paper, we establish that weight-encoded neural implicits meet the criteria of a first-class 3D shape representation. We introduce a suite of technical contributions to improve reconstruction accuracy, convergence, and robustness when learning the signed distance field induced by a polygonal mesh -- the _de facto_ standard representation. Viewed as a lossy compression, our conversion outperforms standard techniques from geometry processing. Compared to previous latent- and weight-encoded neural implicits we demonstrate superior robustness, scalability, and performance.

Full PDF

OOverﬁt Neural Networks as a Compact Shape Representation

Thomas Davies , Derek Nowrouzezahrai and Alec Jacobson University of Toronto, Canada McGill University, Canada

Our Neural ImplicitDeepSDFOriginal Uniform Grid

Figure 1: Geometries can be represented with inﬁnite resolution by their continuous signed distance ﬁelds (SDFs). The SDF of a shape canbe approximated by storing values on a regular grid. Uniform grids are wasteful for storing values far from the surface, resulting in poorreconstructions for small grid sizes. Comparably, a DeepSDF [PFS*19] model can more effectively encode the original shape but requiresthe shape is consistently aligned and semantically similar (airplanes, cars, boats, etc.) to shapes within the training set, and fails to capturethe unique characteristics of the model (i.e. shark ﬁns). Our

Neural Implicit format is produced by overﬁtting a neural network to the singlegeometry directly, providing a compact representation with far greater accuracy, regardless of class or orientation, compared to uniform gridsof the same memory impact (64 kB shown here). gpvillamil (right) under CC BY.

Abstract

Neural networks have proven to be effective approximators of signed distance ﬁelds (SDFs) for solid 3D objects. While priorwork has focused on the generalization power of such approximations, we instead explore their suitability as a compact – ifpurposefully overﬁt – SDF representation of individual shapes. Speciﬁcally, we ask whether neural networks can serve asﬁrst-class implicit shape representations in computer graphics. We call such overﬁt networks Neural Implicits. Similar to SDFsstored on a regular grid, Neural Implicits have ﬁxed storage proﬁles and memory layout, but afford far greater accuracy. At equalstorage cost, Neural Implicits consistently match or exceed the accuracy of irregularly-sampled triangle meshes. We achievethis with a combination of a novel loss function, sampling strategy and supervision protocol designed to facilitate robust shapeoverﬁtting. We demonstrate the ﬂexibility of our representation on a variety of standard rendering and modeling tasks.

1. Introduction

Signed distance ﬁelds (SDFs) are a ver-satile implicit surface representation,useful throughout computer graphics[BBB*97]. Complex objects repre-sented as SDFs can be authored semi-analytically by (incrementally) com-posing geometric primitives with space warping, blending opera-tions, and replicating functions (see inset by Inigo Quilez). However,storing an SDF as a long composition of expressions does not scale,especially to shapes with a high level of (non-procedural) detail.What is the best way to store an SDF?Approximating an SDF by storing values on a regular grid speedsup evaluation at the cost of precomputation (effectively treating the SDF as 3D table look up). Shapes stored as SDFs on grids beneﬁtby having ﬁxed storage proﬁles and memory layouts. Unfortunately,this comes at a cost, as grids wastefully store a dense sampling of theSDF far from the surface where the value is smooth and predictable.While octrees and truncated signed distance functions can storeSDFs asymptotically more efﬁciently, their representation incursnon-uniform computational and memory costs (e.g., the octree leafnodes of one surface are different from those of another).Meanwhile, explicit representations are ubiquitous as a data for-mat for distributing 3D models: hundreds of millions of meshesare available online. While easier to animate and texture, explicitrepresentations like meshes are cumbersome for shape modelingand querying tasks common in computer vision, simulation andgeometric learning. This raises the question: how do we convertmesh assets into implicit representations? Embedding a mesh in, © 2020 The Author(s) a r X i v : . [ c s . G R ] O c t . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation e.g., a spatial hierarchy to compute point-mesh signed distances isinefﬁcient compared to ﬂat table lookups or evaluation of analyticexpressions. Bounding hierarchy distance queries also require di-vergent computations, e.g., different queries on the same shape canhave drastically different computational and memory access costs.Converting to grids or octrees does not avoid their limitations.We show that overﬁtting a deep neural network to the SDF of asingle solid is effective, and we advocate for its consideration as aﬁrst-class implicit representation. We show that these overﬁt neuralnetworks – which we call Neural Implicits – are a shape represen-tation that inherit the effectively inﬁnite resolution of implicits butwith the computational efﬁciency of coarse meshes and the memoryaccess uniformity of a ﬁxed grid.

Implicit representations are especially attractive for geometric ma-chine learning. Voxel occupancy in a 3D image (grid storing in-side/outside values per cell) is a homogeneous representation espe-cially amenable to classiﬁcation and convolution networks [MS15;WSK*15]. Large datasets of 3D meshes (e.g., ShapeNet [CFG*15])can easily be converted to a voxel grid or grid-based SDF [NLBY18;HSG18; LYF17], so that the network architecture can input a ho-mogenous image format similar to 2D convolution networks. At-tempting to learn directly on 3D meshes has been attempted, butarchitectures become esoteric [HHF*19; WKBS19] and may relyon supplemental handcrafted features (see longer discussion in[BBL*17]). Conversion to point clouds (an unordered set) sidestepsthe homogeneity issue by removing dependence on ordering or ex-plicit/implicit knowledge of the shape’s manifold structure entirely[QSMG16]. While most of these representations have proven suc-cess at classiﬁcation and recognition tasks, a much more dauntingtask is generative modeling. Networks that output an entire occu-pancy grid [MS15; WWX*17] or even a sparse grid [WSLT18]are ultimately limited to small grids incapable of representing ﬁnedetail.In just the past year, there has been an explosion of work sparkedby the groundbreaking success of D

EEP

SDF [PFS*19]. P

ARK , F LO - RENCE , S

TRAUB , et al. [PFS*19] approximate the signed distanceto a surface as an evaluation of a deep neural network: like any SDF,the input is a query point in space and the output is a signed distancevalue at that point. Their goal is to learn a latent space for largedataset of class speciﬁc shapes. Their network architecture includesa latent code optmized for each shape while network is trained overthe whole dataset. This functional representation has been shown tobe powerful for generative modeling [MON*19], shape interpolation[LW19], differentiable rendering [LZP*19; NMOG19; JJHZ19], andsurface reconstruction [AL19]. In most cases, networks are trainedover a large class of shapes with a latent vector to encode each shapein question. The resulting shapes are impressive from the point ofview of generative modeling, but inevitably suffer accuracy repro-ducing any given shape (Figure 1) in the pursuit of generalization.Prior methods have focused on learning class-priors across largedatasets to achieve strong results; unfortunately most geometriesin the wild are not consistently aligned ( Figure 11) and do not be-long to some easily discerned shape family. The original D

EEP

SDF brieﬂy considers but quickly discards the idea of overﬁtting itsneural network to each shape individually:“Training a speciﬁc neural network for each shape isneither feasible nor very useful.” — P

ARK , F

LORENCE ,S TRAUB , et al. [PFS*19]We propose training a speciﬁc neural network for each shape andwill show that this is both feasible and very useful.

We demonstrate that overﬁt neural networks, or

Neural Implicits ,exhibit an interesting combination of the desirable qualities of ashape representation. Overﬁtting to a single shape is often treated asa test case before attempting generalization over a larger training set.Indeed, if held to the scrutiny of a shape representation for applica-tions in computer graphics, prior overﬁt results from deep signeddistance ﬁelds are lacking. We identify issues with prior strategiesfor deﬁning the training loss, sampling strategies, and supervision.While many previous methods discuss loss functions and samplingindependently, we propose a loss function deﬁned by a continuousspatial integral. We discretize this integral using Monte Carlo ap-proximation resulting in a query probing supervised sampling strat-egy for training. While prior works also employ stochastic sampling,our integral formulation affords direct application of importancesampling. We propose a simple yet effective subset rejection im-portance sampling strategy that samples close to the input shape’ssurface without biases observed in existing methods. For each querysample, we conduct a supervised stochastic descent step to updatethe network weights. Supervised training requires accurate ground-truth signed distance evaluation. For surfaces with non-manifoldedges, self-intersections and open boundaries, signing methods usedin prior works can fail to behave robustly, often introducing simpli-ﬁcation error (Figure 8) even before training begins. We proposeusing fast winding numbers [BDS*18] as a signing proxy that isexactly correct for solid geometries (those perfectly represented asthe level set of a signed distance ﬁeld) and gracefully degrades formessy input shapes (Figure 9).We demonstrate the effectiveness of overﬁt neural networks as asolid shape representation for a variety of tasks in computer graph-ics, starting with rendering. We compare the economical storage of

Neural Implicits to existing formats (Figure 14). Compared to deci-mated meshes (baseline non uniform memory access), we observethat our ﬁxed memory format has similar surface quality. Comparedto SDFs stored on a grid (baseline consistent memory access) weobserve far better quality.

2. Method

We introduce O

VERFIT

SDF, a neural network architecture trained tooverﬁt to a single shapes signed distance function. Once overﬁt thelearned parameter set θ can be used as an efﬁcient and lightweightrepresentation of the shape. We call this format a Neural Implicit .The

Neural Implicit format of a given shape is the learnt networkweights of a O

VERFIT

Sampled points Neural Implicit Predicted distances

Figure 2: O

VERFIT

SDF network architecture. Given point samplesof an object’s SDF (left), we train a feed-forward neural network(middle) to predict the signed distances (right) of each input point.

A signed distance ﬁeld is a representation in which, at each pointwithin the ﬁeld, we can measure the distance from that point tothe closest point on any shape within the domain. The sign on thedistance ﬁeld represents the direction to the nearest surface, andindicates whether the point is internal or external to objects in thedomain.The signed distance function (SDF) of a surface can be deﬁnedby the metric set Ω of points within the shape, along with metric d . SDF ( x , δΩ ) = (cid:40) − d ( x , δΩ ) x ∈ Ω d ( x , δΩ ) x / ∈ Ω (1)where δΩ denotes the boundary of metric Ω , and d can be deﬁnedas distance from the closest point on δΩ to x .Our goal is to regress a feed-forward network to approximate theSDF of a given surface ( δΩ ), such that, f θ ( x ) ≈ SDF ( x , δΩ ) (2)Once f θ is overﬁt to a given shape, the parameter set θ can thenbe used as a ﬁrst-class implicit representation of the shape. Our O

VERFIT

SDF (Figure 2) is a feed-forward fully connectednetwork with N layers, of hidden size H . Each hidden layer hasReLU non-linearities, while the output layer is activated by TanH.Deep neural networks have a tendency to produce their best re-sults when their depth and width are increased. Unfortunately, thisincrease in complexity proportionally increases memory footprintsof our representation and the time required to render. The NeuralImplicit ’s rendered in Figures 1, 7, 10, 11, 12, and 14 all share acommon architecture of just 8 fully connected layers with a hiddensize of 32 (resulting in just 7553 weights, or 64 kB in memory).Through experimentation on a subset of 1000 mesh geometries fromZ

HOU and J

ACOBSON [ZJ16]’s Thingi10k dataset, we ﬁnd that thisconﬁguration yields a good balance between reconstruction accu-racy, rendering speed, and memory impact (Figure 3). Our chosenarchitecture has a 99% reduction in number of parameters and 93%speed up in time to render ﬁrst frame, while still providing accept-able surface quality when compared to the default architecture inDeepSDF [PFS*19] (without latent optimization).Our O

VERFIT

SDF network can, in theory [HSW89], learn to

Figure 3: We visualize the role that varying the number of networklayers, and hidden layer sizes, play on: (left to right) average recon-struction error, memory footprint and 1-frame render time. Chosenarchitecture shown in blue, default DeepSDF [PFS*19] architecture(without latent optimization) shown in red.Figure 4: As network complexity of the O

VERFIT

SDF network in-creases, the models reconstruction quality increases while renderingspeed decreases. Depending on the application varying levels ofaccuracy can be achieved.Figure 5: Our importance sampling (right) draws a subset of pointsfrom a uniform sample set (left) according to their distance to thesurface (middle).emulate any arbitrary topology shape with inﬁnite precision. Thenetwork complexity can be increased over our base conﬁgurationfor smaller surface reconstruction error, or decreased for faster ren-dering speeds depending on the application. A sample of geometriesproduced at a number of conﬁgurations can be seen in Figure 4.

We train each O

VERFIT

SDF on a set X of points sampled in R ,along with their corresponding signed distance evaluations, for agiven shape. A naïve sampling approach could draw samples uni-formly from the bounding sphere. Our setting, however, is unique inthat the inside-outside decision boundary of a shape is well repre-sented by the 0-isocontour of the SDF. Learning the signed distancevalue far from the surface is useful for efﬁcient ray marching (seesection 3.1) but in practice we want to focus training on points closeto the shape boundary for high quality surface reconstruction. Assuch, instead of employing a random dart throwing scheme we canpursue a more principled approach that focuses samples on pointsmore “informative" to the boundary transitions (Figure 5).Several methods focus a network’s capacity on points close toa boundary. [LW19] propose a vertex sampling method that draws © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation samples uniformly from a mesh’s vertex positions before perturbingthe samples according to an isotropic 3D Gaussian. Sampling basedexclusively on vertex positions biases coverage due to the underlyingmesh tessellation, leading to unwanted anisotropies due to variationsin triangle surface areas, particularly when the Gaussian variance isnot carefully set (Figure 6). The state-of-the-art DeepSDF method[PFS*19] improves on this scheme by drawing samples uniformlyover triangle surfaces, before similarly perturbing with a Gaussian.Such an approach still introduces a bias due to the non-uniform dis-tribution of nearby mesh faces (Figure 6). Both approaches appendan additional set of uniformly sampled spatial points in order tooffset their surface bias.We employ simple least absolute deviations (L1) as our lossfunction, ﬁnding it performs better surface reconstruction whencompared to squared error (L2), which is more sensitive to outliers. L = (cid:90) B | SDF ( x ) − f θ ( x ) | dx (3)Where f θ is our O VERFIT

SDF function,

SDF ( x ) is the true signeddistance at x drawn from bounding volume B . Focusing a net-work’s capacity on speciﬁc points of importance can be achievedby weighting the loss function to exaggerate error around points offocus. L weighted = (cid:90) B | SDF ( x ) − f θ ( x ) | w ( x ) dx (4)This method simply scales the loss function with respect to somemetric of importance w ( x ) , such that low importance training sam-ples have less affect on the loss. This approach can achieve our goalof biasing to points close to the decision boundary by employing anexponential weighting importance metric on the distance from thesurface for any given point. w ( x ) = e − β | SDF ( x ) | (5)Where β can be adjusted from 0 for uniform sampling to ∞ for surface point sampling. Unfortunately, in weighting the lossdirectly, computation is wasted on the forward and backwards passesfor points that are far from the shape’s surface. For example, apoint on the edge of the bounding sphere with a β of 30 will bescaled by 1 . e − having negligible effect on the parameters beingoptimized, yet the same computational cost as if it did. An effectivelyequivalent but more efﬁcient approach is to apply the importancemetric, w ( x ) , to the sampling of points in bounding volume B insteadof scaling the loss.We propose a simple yet effective subset rejection importancesampling strategy that samples points according to their distance tothe input shape’s surface (Figure 5). We discretize our continuousloss integral (Eqn. 4) using Monte Carlo approximation resultingin a query probing supervised sampling strategy for training. Givena set, U , of n uniformly sampled points, we aim to subsample m points from U to create a set S such that, Figure 6: Visualizing sample density of samples when drawing10 points using: (left to right) vertex sampling with N ( p , . ) offsets [CZ19], surface sampling with N ( p , . ) [PFS*19], andour importance sampling approach with w = e − | SDF ( p ) | . We colorsamples according to their density estimated with Gaussian kerneldensity, normalized by the most dense region from vertex sampling. Region biasedsamplingStandardsampling Bias points

Figure 7: Our importance metric can be additionally weighted bydistance from user speciﬁed regions. This weighting allows users tospecify regions of interest (center; shown in red) yielding improvedreconstruction accuracy (right) where desired.1 n ∑ x ∈ U | SDF ( x ) − f θ ( x ) | w ( x ) ≈ m ∑ x ∈ S | SDF ( x ) − f θ ( x | ) (6)We ﬁnd that using Equation 5 as a method of biasing samplesclose to the surface leads to faster convergence and reduced surfacereconstruction error compared to uniform sampling (96 epochs withsurface error of 0.00231). While, when compared to DeepSDF’ssurface sampling scheme we ﬁnd similar results for both conver-gence speed and surface quality (average of 86 epochs each, withsurface error of 0.00138 and 0.00131 respectively). In practice wegenerate set U by sampling 10M points within the unit sphere, andemploying our importance rejection strategy to populate subset S with 1M points.A major beneﬁt of our importance sampling scheme is the re-moval of unintended bias presented by previous approaches, andthe complete ﬂexibility in introducing targeted bias. The importancemetric, w ( x ) , can be modiﬁed to introduce bias towards regions ofhigh curvature, minimum feature size (emulating [PFS*19] bias),or to speciﬁc regions of the mesh for areas of high reconstructionimportance (Figure 7). This ﬂexibility allows for greater use of © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation Original geometrywith slice plane Visual hullreconstruction Winding numberreconstruction

Figure 8: Our approach to signing allows us to support con-verting non-manifold mesh, without sacriﬁcing the true topologyof the mesh. Unlike [PFS*19] visual hull method (middle), ourmethod(right) maintains complex internal structures. Virtox (left)under CC BY.

Messyinput mesh Unsigneddistance field Winding number field Robust signed distance field

Figure 9: Our conversion process supports non-watertight mesheswith open boundaries, self intersections, and non-manifold edges.the network’s capacity on regions or features important to the user,without increasing overall network complexity.

The most obvious approach for computing the sign of a point inspace is to use the shape’s surface normals; calculating the dotproduct of the normal and the direction vector to point x . If thedirection vector points the same way as the surface normal, then x is external to the surface, and internal otherwise. Unfortunately, thissimple process assumes that the input shape is a watertight (closed,non-intersecting, manifold) mesh.We instead sign our distances with generalized winding numbers[JKS13] enabling us to process meshes with self-intersections, non-manifold pieces, and open boundaries. Previous approaches forsigning distances for learning implicit ﬁelds either voxelized thespace [MON*19], requiring watertight inputs, or a computationallyexpensive visibility hull [PFS*19], signiﬁcantly reducing modelcomplexity and "closing" off internal structures (Figure 8) beforetraining even begins. In contrast our method signs distances exactlycorrect for solid geometries (those perfectly represented as the levelset of a signed distance ﬁeld) and gracefully degrades for messyinput shapes (Figure 9).For fast and efﬁcient sign evaluation we use fast winding num-bers [BDS*18], a tree based algorithm for fast approximation ofgeneralized winding numbers. With fast winding numbers we cangenerate training set X and overﬁt our network to even the mostproblematic meshes (Figure 9), in an average of 90s. Neural Implicit

File Format

The

Neural Implicit ﬁle format is designed to be simple to consumeand integrate into existing pipelines currently relying on classicSDF representations. For each trained O

VERFIT

SDF the chosennetwork architecture and geometry transformation matrix (since allgeometries are normalized to the unit sphere) are written as the ﬁrstbytes before encoding the network’s learnt parameter set θ into ourbinary format.Our homogeneous Neural Implicit format allows for a singlequery implementation regardless of network architecture used in theoverﬁtting process. The ﬁxed storage proﬁles and memory layout ofour learnt implicit functions provide consistent query and renderingspeeds. Once trained, our

Neural Implicit format can be treated asany other ﬁrst class implicit representation.

Our

Neural Implicit representation can be treated as its classicalcounter part (SDF) and rendered efﬁciently using ray marching. Raymarching [Har96] is a common technique for rendering implicitﬁelds where rays are initialized in the image plane and iteratively"marched" along each ray by a step size equal to the signed dis-tance function value at the point. A single ray is marched until itis sufﬁciently close to the surface ( ε VERFIT

SDF. We initialize the starting position of eachray to be it’s ﬁrst intersection with the unit sphere, since all

NeuralImplicit s are normalized to lay within. Rays that do not intersectthe unit sphere are pruned from the set before marching begins. Asrays of the image converge at different times, we employ a dynamicbatching scheme that composes batches of points for inference basedon a mask buffer which tracks rays that have converged to the surfaceor reached the maximum number of steps. Additionally, the dynamicbatching method allows us to append additional points when surfacenormals are required for shading.

3. Implementation and Results

Our

Neural Implicit representation can be used in applications asif it were the true signed distance ﬁeld of the shape. Here, wedemonstrate O

VERFIT

SDF’s ability to learn a shape’s SDF functionfor fast and efﬁcient rendering and as a compressed representationof the original geometry.We implement O

VERFIT

SDF networks in Tensorﬂow [MAP*15]while point sampling and mesh processing are implemented in libigl[JPS*16]. We train our model for a maximum of 100 epochs andallow early stopping for geometries that converge quickly. We usethe ADAM optimizer [KB14] with a ﬁxed learning rate of 10 − .These settings generalized well across a wide range of geometries(see Fig. 12 and Fig. 15). © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation Figure 10:

Neural Implicit s admit many trivial interactive manip-ulations, the models predicted distance’s can be modiﬁed throughboolean operations similar to any implicit ﬁeld. See accompanyingvideo for animation. θ = 0° θ = 5° θ = 10°DeepSDFOurs Figure 11: DeepSDF [PFS*19] reconstruction quality degradesquickly for geometries not aligned to default orientation per class.Our method converges to the same quality regardless of orientation.See accompanying video for animation.

We implement our renderer in CUDA as kernels for sphere marching,fragment shading and batch preparation. We use the CUTLASS[KMDT18] CUDA linear algebra library for fast and efﬁcient stridedGEMM (general matrix multiplication) calculation, required forinference against O

VERFIT

SDF models.We achieve an average frame rate of

34 FPS when renderingat 512 by 512 resolution on an Nvidia P100 GPU across a subsetof the Thingi10k dataset. Although not acceptable for real-timerendering applications, this result is a signiﬁcant improvement overprevious learnt implicit rendering pipelines. In [LZP*19] renderingis optimized to 1 FPS by overstepping along all rays by a factorof 50%, increasing the convergence criteria, and implementing acoarse to ﬁne strategy. These optimizations could further improveour rendering speed but would reduce the overall quality of therenders and were not integrated. )As our representation is a learnt representation of the SDF weinherit all the beneﬁts of traditional implicit functions. Our

NeuralImplicit can be smoothly interpolated in implicit space, and can beinteractively modiﬁed with constructive solid geometry operationsshown in Figure 10.

Training deep neural networks on large geometric datasets has clas-sically been a cumbersome, time consuming task. For our

NeuralImplicit representation to be effective, any geometry must be able tobe converted in a reasonable amount of time. Due to the minimalcomplexity of our base conﬁguration (8 layers of 32 neurons) weﬁnd that we can overﬁt our model to any geometry in an average of Figure 12: Thingi10k models [ZJ16] compressed to as NeuralImplicits . Our representation gracefully scales to high-quality recon-structions as its footprint increases, using an order of magnitude lessmemory than alternative representations at equal quality.

Original NeuralImplicit

Figure 13: Our

Neural Implicit format encodes sharp edges with ahigh degree of accuracy.90s. Additionally, with only 64kB of memory required, many mod-els can be trained concurrently on modern GPUs without nearingthe memory limits. Converting the entirety of the 10,000 models inthe Thingi10k dataset [ZJ16] on an Nvidia Titan RTX took just 16hours on a single GPU or 4 hours when trained in parallel across 4Nvidia Titan RTX cards.Conversion of the Thingi10k dataset from mesh format to

NeuralImplicit format reduces the overall storage impact from 38.85 GBto just 640 MB. Comparatively if DeepSDF [PFS*19] could betrained on the same dataset, all geometries would be compressedto just 7 MB. Unfortunately, through our experiments we foundthat DeepSDF results depend on (a) all geometries belonging to asingle class (b) all geometries being consistently oriented and (c)the number of geometries in the dataset. We can see in Figure 11that rotating the input mesh by just a few degrees results in dras-tic reconstruction errors, while O

VERFIT

SDF produces consistentresults regardless of orientation. With this experiment, we demon-strate why DeepSDF failed to converge to reasonable results on theThingi10k dataset, since geometries are arbitrarily oriented and ofno speciﬁc class. Additionally, even if DeepSDF did not suffer fromthe aforementioned problems training on the full 10,000 geometrydataset would be intractable due to the memory required and storingand optimizing the 10,000 shape embeddings (without signiﬁcantlyreducing embedding size, network complexity, or the dataset beingconverted). © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation

As many of the geometries in Thingi10k dataset are organic"smooth" shapes, we also verify that our method is capable of main-taining sharp edges in reconstructions. We ﬁnd that our networkis able to recreate sharp edges (Fig. 13 & 16) with a high level ofaccuracy, despite not being speciﬁcally biased to do so.In our conversion of the Thingi10k dataset, the architecture wasﬁxed, yielding an efﬁcient and constant memory representation.However, if surface reconstruction quality is a priority, the focus caninstead shift to an error driven surface ﬁtting (similar too classicalapproaches [OBA*05]), scaling network complexity according tothe input geometry. As each O

VERFIT

SDF produced encodes itsown architecture, this change will result in simple geometries beingencoded into minimal parameter conﬁgurations (base: 7553) whiletopologically complex geometries can be represented by higherresolution conﬁgurations. The effect of this error driven optimizationapproach can be seen in Figure 16, where a simple grid searchwas performed until surface error met a target goal. Based on ourconversion of the Thingi10k dataset, we ﬁnd that majority of modelsare represented well by our base conﬁguration (Fig. 15), wherethe tails of the error distribution could be retrained with additionalcomplexity until some target surface error is achieved.

All images in Figure 12 were rendered from

Neural Implicit gen-erated from our base conﬁguration, resulting in a total parameterset of 7553 tuned to each shape’s implicit function. At just 64 kBof memory we ﬁnd that our lightweight representation can capturecomplex topologies with relatively high resolution compared to uni-form signed distance grids or decimated mesh if similar memoryfootprints. For the comparison in Figure 14, each geometry wasconverted to a

Neural Implicit in our base conﬁguration of 7553 pa-rameters, and is visualized beside the rendered result of a uniformlysampled SDF grid with 20 samples, along with the original meshadaptively decimated [GH97] down to contain only 7600 ﬂoats.Compared to decimated meshes (baseline non-uniform format), weobserve that our homogeneous format has similar surface quality.Compared to SDFs stored on a grid (baseline uniform format) weobserve far better quality. Additionally, we see that our approachbetter captures high frequency surface detail than both representa-tions, often producing results that more closely match the "style" ofthe original shape.We quantify our methods robustness by converting the entirety ofthe Thingi10k [ZJ16] dataset to our Neural Implicit format, measur-ing the average surface error (Eqn. 7) and training loss. The trainingloss reported is the mean of errors between the true and predictedSDF values at points sampled using our importance metric outlinedin Section 2.1.2. The surface error is the sum of errors at pointsalong the shape’s 0-isocontour,Surface Error = N N ∑ i = | f θ ( p i ) | (7)We use these simple metrics in conjunction for their ability to notonly measure error at the surface but also error within the boundingvolume. Errors within the bounding volume manifest as increasedrendering times or holes in the model during rendering, while surfaceerrors are clear when marching cubes over the learnt SDF ﬁeld. Figure 14: Our learnt Neural Implicit format (right) can be shownto better approximate the original surface (grey, inset) comparedto adaptive decimation of the original triangle mesh [GH97] (left)and uniform signed distance grid (middle) with equal memory im-pact. gpvillamil (skull), Makerbot (whale), morenaP (frog), artec3d(dragon), JuliaTruchsess (octopus) under CC BY.We sample 100,000 surface points for measuring average surfaceerror, and assess converged loss against the training set of 1 M points.The results on the entirety of the Thingi10k dataset are visualized inFigure 15. We ﬁnd that at this conﬁguration that 93% of the 10,000geometries in the diverse Thingi10k dataset converge to a surfaceerror below 0.003, with no model exceeding 0.01 (worst case, of0.0097 shown in Fig. 16).

4. Limitations and Future Work

In some cases, when a mesh is of exceptional topological , we ﬁndthat our base conﬁguration of only 7553, simply cannot representthe highly non-linear implicit ﬁeld. As shown in Figure 16, highlycomplex geometries cannot be represented well by our limited reso-lution network, similarly the geometry cannot by a mesh decimatedto equivalent memory footprint. These cases are simple to detect © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation

Surface ErrorTraining Loss

Figure 15: Loss and surface error distributions over the entirety ofthe Thingi10k [ZJ16] dataset.

Original

Lorem ipsum

Decimated Base Increased

Figure 16: With only 7553 parameters, our base Neural Implicit format can lack the representative power to converge on highlycomplex geometries (similar to decimated mesh with same memoryfootprint). Increasing the network capacity to equal the memoryimpact of the original mesh results in near perfect reconstructions.tbuser (left) under CC BY.by analyzing the surface error during training, and can easily berectiﬁed by increasing the network resolution.Neural networks have proven to be effective approximators ofsigned distance ﬁelds (SDFs) for solid 3D objects. Existing meth-ods have seen success for generative modelling, shape interpola-tion, surface reconstruction, and differentiable rendering. Howevercompelling, these methods share a common ﬂaw, they depend onsemantically consistent geometries in a ﬁxed orientation. Instead offocusing training a model to generalize across some small subsetof category, we overﬁt networks to single geometries applying thefull network capacity to the reconstruction quality of a single shape.This not only allows for high quality reconstructions with minimalmemory, but also supports any arbitrary geometry regardless oforientation or class.We set out to show that overﬁtting a neural network to a singlesigned distance ﬁeld is not only feasible, but can be useful in pro-viding an efﬁcient compact representation of the shape. We showedhow our

Neural Implicit representation can effectively capture ashapes signed distance function while maintaining a low memoryimpact and fast inference speeds. Our O

VERFIT

SDF networks area shape representation that inherit the effective inﬁnite resolutionof continuous implicits but with the computational efﬁciency ofcoarse meshes and the uniform memory patterns of an SDF grid. Wehope that our geometrically principled approach to sampling train-ing points and signing distances lead to more robust and extensibleresearch in geometric deep learning.

References [AL19] A

TZMON , M

ATAN and L

IPMAN , Y

ARON . “Sal: Sign agnostic learn-ing of shapes from raw data”. arXiv preprint arXiv:1911.10414 (2019) 2. [BBB*97] B

LOOMENTHAL , J

ULES , B

AJAJ , C

HANDRAJIT , B

LINN , J IM ,et al. Introduction to implicit surfaces . Morgan Kaufmann, 1997 1.[BBL*17] B

RONSTEIN , M

ICHAEL

M., B

RUNA E STRACH , J

OAN , L E C UN ,Y ANN , et al. “Geometric Deep Learning: Going beyond Euclidean data”.English (US).

IEEE Signal Processing Magazine

ISSN : 1053-5888.

DOI : ARILL , G

AVIN , D

ICKSON , N

EIL , S

CHMIDT , R

YAN , et al. “FastWinding Numbers for Soups and Clouds”.

ACM Transactions on Graphics (2018) 2, 5.[CFG*15] C

HANG , A

NGEL

X, F

UNKHOUSER , T

HOMAS , G

UIBAS ,L EONIDAS , et al. “Shapenet: An information-rich 3d model repository”. arXiv preprint arXiv:1512.03012 (2015) 2.[CZ19] C

HEN , Z

HIQIN and Z

HANG , H AO . “Learning Implicit Fields forGenerative Shape Modeling”. Proceedings of IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) (2019) 4.[GH97] G

ARLAND , M

ICHAEL and H

ECKBERT , P

AUL

S. “Surface sim-pliﬁcation using quadric error metrics”.

Proceedings of the 24th an-nual conference on Computer graphics and interactive techniques . ACMPress/Addison-Wesley Publishing Co. 1997, 209–216 7.[Har96] H

ART , J

OHN

C. “Sphere tracing: A geometric method for theantialiased ray tracing of implicit surfaces”.

The Visual Computer

ANOCKA , R

ANA , H

ERTZ , A

MIR , F

ISH , N OA , et al.“MeshCNN: a network with an edge”. ACM Transactions on Graphics(TOG)

UANG , J

INGWEI , S U , H AO , and G UIBAS , L

EONIDAS

J. “RobustWatertight Manifold Surface Generation Method for ShapeNet Models”.

CoRR abs/1802.01698 (2018) 2.[HSW89] H

ORNIK , K

URT , S

TINCHCOMBE , M

AXWELL , and W

HITE , H AL - BERT . “Multilayer feedforward networks are universal approximators”.

Neural networks

IANG , Y UE , J I , D ANTONG , H AN , Z HIZHONG , and Z

WICKER ,M ATTHIAS . SDFDiff: Differentiable Rendering of Signed Distance Fieldsfor 3D Shape Optimization . 2019. arXiv:

ACOBSON , A

LEC , K

AVAN , L

ADISLAV , and S

ORKINE -H ORNUNG , O

LGA . “Robust inside-outside segmentation usinggeneralized winding numbers”.

ACM Transactions on Graphics (TOG)

ACOBSON , A

LEC , P

ANOZZO , D

ANIELE , S

CHÜLLER , C, et al. libigl: A simple C++ geometry processing library . 2016 5.[KB14] K

INGMA , D

IEDERIK

P and B A , J IMMY . “Adam: A method forstochastic optimization”. arXiv preprint arXiv:1412.6980 (2014) 5.[KMDT18] K

ERR , A

NDREW , M

ERRILL , D

UANE , D

EMOUTH , J

ULIEN , andT

RAN , J

OHN . CUTLASS: Fast Linear Algebra in CUDA C . Sept. 2018.

URL : https://devblogs.nvidia.com/cutlass- linear-algebra-cuda/ ITTWIN , G

IDI and W

OLF , L

IOR . “Deep Meta Functionals forShape Representation”.

CoRR abs/1908.06277 (2019). arXiv: . URL : http://arxiv.org/abs/1908.06277

2, 3.[LYF17] L IU , J ERRY , Y U , F ISHER , and F

UNKHOUSER , T

HOMAS

A. “In-teractive 3D Modeling with a Generative Adversarial Network”.

Proc.3DV . 2017, 126–134 2.[LZP*19] L IU , S HAOHUI , Z

HANG , Y

INDA , P

ENG , S

ONGYOU , et al.

DIST:Rendering Deep Implicit Signed Distance Function with DifferentiableSphere Tracing . 2019. arXiv:

2, 6.[MAP*15] M

ARTIN A BADI , A

SHISH A GARWAL , P

AUL B ARHAM , etal.

TensorFlow: Large-Scale Machine Learning on Heterogeneous Sys-tems . Software available from tensorﬂow.org. 2015.

URL : http : / /tensorflow.org/ ESCHEDER , L

ARS , O

ECHSLE , M

ICHAEL , N

IEMEYER ,M ICHAEL , et al. “Occupancy Networks: Learning 3D Reconstructionin Function Space”.

Proceedings IEEE Conf. on Computer Vision andPattern Recognition (CVPR) . 2019 2, 5. © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overﬁt Neural Networks as a Compact Shape Representation [MS15] M

ATURANA , D

ANIEL and S

CHERER , S

EBASTIAN . “VoxNet: A 3DConvolutional Neural Network for real-time object recognition”.

Ieee/rsjInternational Conference on Intelligent Robots and Systems . 2015, 922–928 2.[NLBY18] N

GUYEN -P HUOC , T HU , L I , C HUAN , B

ALABAN , S

TEPHEN ,and Y

ANG , Y

ONG -L IANG . “RenderNet: A deep convolutional networkfor differentiable rendering from 3D shapes”.

Proc. NeurIPS . 2018, 7902–7912 2.[NMOG19] N

IEMEYER , M

ICHAEL , M

ESCHEDER , L

ARS , O

ECHSLE ,M ICHAEL , and G

EIGER , A

NDREAS . Differentiable Volumetric Render-ing: Learning Implicit 3D Representations without 3D Supervision . 2019.arXiv:

HTAKE , Y

UTAKA , B

ELYAEV , A

LEXANDER , A

LEXA , M

ARC ,et al. “Multi-level partition of unity implicits”.

Acm Siggraph 2005Courses . 2005, 173–es 7.[PFS*19] P

ARK , J

EONG J OON , F

LORENCE , P

ETER , S

TRAUB , J

ULIAN ,et al. “DeepSDF: Learning Continuous Signed Distance Functions forShape Representation”.

The IEEE Conference on Computer Vision andPattern Recognition (CVPR) . June 2019 1–6.[QSMG16] Q I , C HARLES

R, S U , H AO , M O , K AICHUN , and G

UIBAS ,L EONIDAS

J. “PointNet: Deep Learning on Point Sets for 3D Classiﬁca-tion and Segmentation”. arXiv preprint arXiv:1612.00593 (2016) 2.[WKBS19] W

ANG , Y U , K IM , V LADIMIR

G., B

RONSTEIN , M

ICHAEL , andS

OLOMON , J

USTIN . “Learning Geometric Operators on Meshes”.

ICLRWorkshop on Representation Learning on Graphs and Manifolds (2019) 2.[WSK*15] W U , Z HIRONG , S

ONG , S

HURAN , K

HOSLA , A

DITYA , et al.“3D ShapeNets: A Deep Representation for Volumetric Shape Modeling”.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Boston, USA, June 2015 2.[WSLT18] W

ANG , P

ENG -S HUAI , S UN , C HUN -Y U , L IU , Y ANG , andT

ONG , X IN . “Adaptive O-CNN: A Patch-based Deep Representationof 3D Shapes”. ACM Transactions on Graphics (SIGGRAPH Asia) U , J IAJUN , W

ANG , Y

IFAN , X UE , T IANFAN , et al. “Marr-Net: 3D Shape Reconstruction via 2.5D Sketches”.

Advances In NeuralInformation Processing Systems . 2017 2.[ZJ16] Z

HOU , Q

INGNAN and J

ACOBSON , A

LEC . “Thingi10k: A dataset of10,000 3d-printing models”. arXiv preprint arXiv:1605.04797 (2016) 3,6–8.(2016) 3,6–8.