On the Effectiveness of Weight-Encoded Neural Implicit 3D Shapes
OOverfit Neural Networks as a Compact Shape Representation
Thomas Davies , Derek Nowrouzezahrai and Alec Jacobson University of Toronto, Canada McGill University, Canada
Our Neural ImplicitDeepSDFOriginal Uniform Grid
Figure 1: Geometries can be represented with infinite resolution by their continuous signed distance fields (SDFs). The SDF of a shape canbe approximated by storing values on a regular grid. Uniform grids are wasteful for storing values far from the surface, resulting in poorreconstructions for small grid sizes. Comparably, a DeepSDF [PFS*19] model can more effectively encode the original shape but requiresthe shape is consistently aligned and semantically similar (airplanes, cars, boats, etc.) to shapes within the training set, and fails to capturethe unique characteristics of the model (i.e. shark fins). Our
Neural Implicit format is produced by overfitting a neural network to the singlegeometry directly, providing a compact representation with far greater accuracy, regardless of class or orientation, compared to uniform gridsof the same memory impact (64 kB shown here). gpvillamil (right) under CC BY.
Abstract
Neural networks have proven to be effective approximators of signed distance fields (SDFs) for solid 3D objects. While priorwork has focused on the generalization power of such approximations, we instead explore their suitability as a compact – ifpurposefully overfit – SDF representation of individual shapes. Specifically, we ask whether neural networks can serve asfirst-class implicit shape representations in computer graphics. We call such overfit networks Neural Implicits. Similar to SDFsstored on a regular grid, Neural Implicits have fixed storage profiles and memory layout, but afford far greater accuracy. At equalstorage cost, Neural Implicits consistently match or exceed the accuracy of irregularly-sampled triangle meshes. We achievethis with a combination of a novel loss function, sampling strategy and supervision protocol designed to facilitate robust shapeoverfitting. We demonstrate the flexibility of our representation on a variety of standard rendering and modeling tasks.
1. Introduction
Signed distance fields (SDFs) are a ver-satile implicit surface representation,useful throughout computer graphics[BBB*97]. Complex objects repre-sented as SDFs can be authored semi-analytically by (incrementally) com-posing geometric primitives with space warping, blending opera-tions, and replicating functions (see inset by Inigo Quilez). However,storing an SDF as a long composition of expressions does not scale,especially to shapes with a high level of (non-procedural) detail.What is the best way to store an SDF?Approximating an SDF by storing values on a regular grid speedsup evaluation at the cost of precomputation (effectively treating the SDF as 3D table look up). Shapes stored as SDFs on grids benefitby having fixed storage profiles and memory layouts. Unfortunately,this comes at a cost, as grids wastefully store a dense sampling of theSDF far from the surface where the value is smooth and predictable.While octrees and truncated signed distance functions can storeSDFs asymptotically more efficiently, their representation incursnon-uniform computational and memory costs (e.g., the octree leafnodes of one surface are different from those of another).Meanwhile, explicit representations are ubiquitous as a data for-mat for distributing 3D models: hundreds of millions of meshesare available online. While easier to animate and texture, explicitrepresentations like meshes are cumbersome for shape modelingand querying tasks common in computer vision, simulation andgeometric learning. This raises the question: how do we convertmesh assets into implicit representations? Embedding a mesh in, © 2020 The Author(s) a r X i v : . [ c s . G R ] O c t . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation e.g., a spatial hierarchy to compute point-mesh signed distances isinefficient compared to flat table lookups or evaluation of analyticexpressions. Bounding hierarchy distance queries also require di-vergent computations, e.g., different queries on the same shape canhave drastically different computational and memory access costs.Converting to grids or octrees does not avoid their limitations.We show that overfitting a deep neural network to the SDF of asingle solid is effective, and we advocate for its consideration as afirst-class implicit representation. We show that these overfit neuralnetworks – which we call Neural Implicits – are a shape represen-tation that inherit the effectively infinite resolution of implicits butwith the computational efficiency of coarse meshes and the memoryaccess uniformity of a fixed grid.
Implicit representations are especially attractive for geometric ma-chine learning. Voxel occupancy in a 3D image (grid storing in-side/outside values per cell) is a homogeneous representation espe-cially amenable to classification and convolution networks [MS15;WSK*15]. Large datasets of 3D meshes (e.g., ShapeNet [CFG*15])can easily be converted to a voxel grid or grid-based SDF [NLBY18;HSG18; LYF17], so that the network architecture can input a ho-mogenous image format similar to 2D convolution networks. At-tempting to learn directly on 3D meshes has been attempted, butarchitectures become esoteric [HHF*19; WKBS19] and may relyon supplemental handcrafted features (see longer discussion in[BBL*17]). Conversion to point clouds (an unordered set) sidestepsthe homogeneity issue by removing dependence on ordering or ex-plicit/implicit knowledge of the shape’s manifold structure entirely[QSMG16]. While most of these representations have proven suc-cess at classification and recognition tasks, a much more dauntingtask is generative modeling. Networks that output an entire occu-pancy grid [MS15; WWX*17] or even a sparse grid [WSLT18]are ultimately limited to small grids incapable of representing finedetail.In just the past year, there has been an explosion of work sparkedby the groundbreaking success of D
EEP
SDF [PFS*19]. P
ARK , F LO - RENCE , S
TRAUB , et al. [PFS*19] approximate the signed distanceto a surface as an evaluation of a deep neural network: like any SDF,the input is a query point in space and the output is a signed distancevalue at that point. Their goal is to learn a latent space for largedataset of class specific shapes. Their network architecture includesa latent code optmized for each shape while network is trained overthe whole dataset. This functional representation has been shown tobe powerful for generative modeling [MON*19], shape interpolation[LW19], differentiable rendering [LZP*19; NMOG19; JJHZ19], andsurface reconstruction [AL19]. In most cases, networks are trainedover a large class of shapes with a latent vector to encode each shapein question. The resulting shapes are impressive from the point ofview of generative modeling, but inevitably suffer accuracy repro-ducing any given shape (Figure 1) in the pursuit of generalization.Prior methods have focused on learning class-priors across largedatasets to achieve strong results; unfortunately most geometriesin the wild are not consistently aligned ( Figure 11) and do not be-long to some easily discerned shape family. The original D
EEP
SDF briefly considers but quickly discards the idea of overfitting itsneural network to each shape individually:“Training a specific neural network for each shape isneither feasible nor very useful.” — P
ARK , F
LORENCE ,S TRAUB , et al. [PFS*19]We propose training a specific neural network for each shape andwill show that this is both feasible and very useful.
We demonstrate that overfit neural networks, or
Neural Implicits ,exhibit an interesting combination of the desirable qualities of ashape representation. Overfitting to a single shape is often treated asa test case before attempting generalization over a larger training set.Indeed, if held to the scrutiny of a shape representation for applica-tions in computer graphics, prior overfit results from deep signeddistance fields are lacking. We identify issues with prior strategiesfor defining the training loss, sampling strategies, and supervision.While many previous methods discuss loss functions and samplingindependently, we propose a loss function defined by a continuousspatial integral. We discretize this integral using Monte Carlo ap-proximation resulting in a query probing supervised sampling strat-egy for training. While prior works also employ stochastic sampling,our integral formulation affords direct application of importancesampling. We propose a simple yet effective subset rejection im-portance sampling strategy that samples close to the input shape’ssurface without biases observed in existing methods. For each querysample, we conduct a supervised stochastic descent step to updatethe network weights. Supervised training requires accurate ground-truth signed distance evaluation. For surfaces with non-manifoldedges, self-intersections and open boundaries, signing methods usedin prior works can fail to behave robustly, often introducing simpli-fication error (Figure 8) even before training begins. We proposeusing fast winding numbers [BDS*18] as a signing proxy that isexactly correct for solid geometries (those perfectly represented asthe level set of a signed distance field) and gracefully degrades formessy input shapes (Figure 9).We demonstrate the effectiveness of overfit neural networks as asolid shape representation for a variety of tasks in computer graph-ics, starting with rendering. We compare the economical storage of
Neural Implicits to existing formats (Figure 14). Compared to deci-mated meshes (baseline non uniform memory access), we observethat our fixed memory format has similar surface quality. Comparedto SDFs stored on a grid (baseline consistent memory access) weobserve far better quality.
2. Method
We introduce O
VERFIT
SDF, a neural network architecture trained tooverfit to a single shapes signed distance function. Once overfit thelearned parameter set θ can be used as an efficient and lightweightrepresentation of the shape. We call this format a Neural Implicit .The
Neural Implicit format of a given shape is the learnt networkweights of a O
VERFIT
SDF model trained on samples drawn fromthe shape’s signed distance function. © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation
Sampled points Neural Implicit Predicted distances
Figure 2: O
VERFIT
SDF network architecture. Given point samplesof an object’s SDF (left), we train a feed-forward neural network(middle) to predict the signed distances (right) of each input point.
A signed distance field is a representation in which, at each pointwithin the field, we can measure the distance from that point tothe closest point on any shape within the domain. The sign on thedistance field represents the direction to the nearest surface, andindicates whether the point is internal or external to objects in thedomain.The signed distance function (SDF) of a surface can be definedby the metric set Ω of points within the shape, along with metric d . SDF ( x , δΩ ) = (cid:40) − d ( x , δΩ ) x ∈ Ω d ( x , δΩ ) x / ∈ Ω (1)where δΩ denotes the boundary of metric Ω , and d can be definedas distance from the closest point on δΩ to x .Our goal is to regress a feed-forward network to approximate theSDF of a given surface ( δΩ ), such that, f θ ( x ) ≈ SDF ( x , δΩ ) (2)Once f θ is overfit to a given shape, the parameter set θ can thenbe used as a first-class implicit representation of the shape. Our O
VERFIT
SDF (Figure 2) is a feed-forward fully connectednetwork with N layers, of hidden size H . Each hidden layer hasReLU non-linearities, while the output layer is activated by TanH.Deep neural networks have a tendency to produce their best re-sults when their depth and width are increased. Unfortunately, thisincrease in complexity proportionally increases memory footprintsof our representation and the time required to render. The NeuralImplicit ’s rendered in Figures 1, 7, 10, 11, 12, and 14 all share acommon architecture of just 8 fully connected layers with a hiddensize of 32 (resulting in just 7553 weights, or 64 kB in memory).Through experimentation on a subset of 1000 mesh geometries fromZ
HOU and J
ACOBSON [ZJ16]’s Thingi10k dataset, we find that thisconfiguration yields a good balance between reconstruction accu-racy, rendering speed, and memory impact (Figure 3). Our chosenarchitecture has a 99% reduction in number of parameters and 93%speed up in time to render first frame, while still providing accept-able surface quality when compared to the default architecture inDeepSDF [PFS*19] (without latent optimization).Our O
VERFIT
SDF network can, in theory [HSW89], learn to
Figure 3: We visualize the role that varying the number of networklayers, and hidden layer sizes, play on: (left to right) average recon-struction error, memory footprint and 1-frame render time. Chosenarchitecture shown in blue, default DeepSDF [PFS*19] architecture(without latent optimization) shown in red.Figure 4: As network complexity of the O
VERFIT
SDF network in-creases, the models reconstruction quality increases while renderingspeed decreases. Depending on the application varying levels ofaccuracy can be achieved.Figure 5: Our importance sampling (right) draws a subset of pointsfrom a uniform sample set (left) according to their distance to thesurface (middle).emulate any arbitrary topology shape with infinite precision. Thenetwork complexity can be increased over our base configurationfor smaller surface reconstruction error, or decreased for faster ren-dering speeds depending on the application. A sample of geometriesproduced at a number of configurations can be seen in Figure 4.
We train each O
VERFIT
SDF on a set X of points sampled in R ,along with their corresponding signed distance evaluations, for agiven shape. A naïve sampling approach could draw samples uni-formly from the bounding sphere. Our setting, however, is unique inthat the inside-outside decision boundary of a shape is well repre-sented by the 0-isocontour of the SDF. Learning the signed distancevalue far from the surface is useful for efficient ray marching (seesection 3.1) but in practice we want to focus training on points closeto the shape boundary for high quality surface reconstruction. Assuch, instead of employing a random dart throwing scheme we canpursue a more principled approach that focuses samples on pointsmore “informative" to the boundary transitions (Figure 5).Several methods focus a network’s capacity on points close toa boundary. [LW19] propose a vertex sampling method that draws © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation samples uniformly from a mesh’s vertex positions before perturbingthe samples according to an isotropic 3D Gaussian. Sampling basedexclusively on vertex positions biases coverage due to the underlyingmesh tessellation, leading to unwanted anisotropies due to variationsin triangle surface areas, particularly when the Gaussian variance isnot carefully set (Figure 6). The state-of-the-art DeepSDF method[PFS*19] improves on this scheme by drawing samples uniformlyover triangle surfaces, before similarly perturbing with a Gaussian.Such an approach still introduces a bias due to the non-uniform dis-tribution of nearby mesh faces (Figure 6). Both approaches appendan additional set of uniformly sampled spatial points in order tooffset their surface bias.We employ simple least absolute deviations (L1) as our lossfunction, finding it performs better surface reconstruction whencompared to squared error (L2), which is more sensitive to outliers. L = (cid:90) B | SDF ( x ) − f θ ( x ) | dx (3)Where f θ is our O VERFIT
SDF function,
SDF ( x ) is the true signeddistance at x drawn from bounding volume B . Focusing a net-work’s capacity on specific points of importance can be achievedby weighting the loss function to exaggerate error around points offocus. L weighted = (cid:90) B | SDF ( x ) − f θ ( x ) | w ( x ) dx (4)This method simply scales the loss function with respect to somemetric of importance w ( x ) , such that low importance training sam-ples have less affect on the loss. This approach can achieve our goalof biasing to points close to the decision boundary by employing anexponential weighting importance metric on the distance from thesurface for any given point. w ( x ) = e − β | SDF ( x ) | (5)Where β can be adjusted from 0 for uniform sampling to ∞ for surface point sampling. Unfortunately, in weighting the lossdirectly, computation is wasted on the forward and backwards passesfor points that are far from the shape’s surface. For example, apoint on the edge of the bounding sphere with a β of 30 will bescaled by 1 . e − having negligible effect on the parameters beingoptimized, yet the same computational cost as if it did. An effectivelyequivalent but more efficient approach is to apply the importancemetric, w ( x ) , to the sampling of points in bounding volume B insteadof scaling the loss.We propose a simple yet effective subset rejection importancesampling strategy that samples points according to their distance tothe input shape’s surface (Figure 5). We discretize our continuousloss integral (Eqn. 4) using Monte Carlo approximation resultingin a query probing supervised sampling strategy for training. Givena set, U , of n uniformly sampled points, we aim to subsample m points from U to create a set S such that, Figure 6: Visualizing sample density of samples when drawing10 points using: (left to right) vertex sampling with N ( p , . ) offsets [CZ19], surface sampling with N ( p , . ) [PFS*19], andour importance sampling approach with w = e − | SDF ( p ) | . We colorsamples according to their density estimated with Gaussian kerneldensity, normalized by the most dense region from vertex sampling. Region biasedsamplingStandardsampling Bias points
Figure 7: Our importance metric can be additionally weighted bydistance from user specified regions. This weighting allows users tospecify regions of interest (center; shown in red) yielding improvedreconstruction accuracy (right) where desired.1 n ∑ x ∈ U | SDF ( x ) − f θ ( x ) | w ( x ) ≈ m ∑ x ∈ S | SDF ( x ) − f θ ( x | ) (6)We find that using Equation 5 as a method of biasing samplesclose to the surface leads to faster convergence and reduced surfacereconstruction error compared to uniform sampling (96 epochs withsurface error of 0.00231). While, when compared to DeepSDF’ssurface sampling scheme we find similar results for both conver-gence speed and surface quality (average of 86 epochs each, withsurface error of 0.00138 and 0.00131 respectively). In practice wegenerate set U by sampling 10M points within the unit sphere, andemploying our importance rejection strategy to populate subset S with 1M points.A major benefit of our importance sampling scheme is the re-moval of unintended bias presented by previous approaches, andthe complete flexibility in introducing targeted bias. The importancemetric, w ( x ) , can be modified to introduce bias towards regions ofhigh curvature, minimum feature size (emulating [PFS*19] bias),or to specific regions of the mesh for areas of high reconstructionimportance (Figure 7). This flexibility allows for greater use of © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation Original geometrywith slice plane Visual hullreconstruction Winding numberreconstruction
Figure 8: Our approach to signing allows us to support con-verting non-manifold mesh, without sacrificing the true topologyof the mesh. Unlike [PFS*19] visual hull method (middle), ourmethod(right) maintains complex internal structures. Virtox (left)under CC BY.
Messyinput mesh Unsigneddistance field Winding number field Robust signed distance field
Figure 9: Our conversion process supports non-watertight mesheswith open boundaries, self intersections, and non-manifold edges.the network’s capacity on regions or features important to the user,without increasing overall network complexity.
The most obvious approach for computing the sign of a point inspace is to use the shape’s surface normals; calculating the dotproduct of the normal and the direction vector to point x . If thedirection vector points the same way as the surface normal, then x is external to the surface, and internal otherwise. Unfortunately, thissimple process assumes that the input shape is a watertight (closed,non-intersecting, manifold) mesh.We instead sign our distances with generalized winding numbers[JKS13] enabling us to process meshes with self-intersections, non-manifold pieces, and open boundaries. Previous approaches forsigning distances for learning implicit fields either voxelized thespace [MON*19], requiring watertight inputs, or a computationallyexpensive visibility hull [PFS*19], significantly reducing modelcomplexity and "closing" off internal structures (Figure 8) beforetraining even begins. In contrast our method signs distances exactlycorrect for solid geometries (those perfectly represented as the levelset of a signed distance field) and gracefully degrades for messyinput shapes (Figure 9).For fast and efficient sign evaluation we use fast winding num-bers [BDS*18], a tree based algorithm for fast approximation ofgeneralized winding numbers. With fast winding numbers we cangenerate training set X and overfit our network to even the mostproblematic meshes (Figure 9), in an average of 90s. Neural Implicit
File Format
The
Neural Implicit file format is designed to be simple to consumeand integrate into existing pipelines currently relying on classicSDF representations. For each trained O
VERFIT
SDF the chosennetwork architecture and geometry transformation matrix (since allgeometries are normalized to the unit sphere) are written as the firstbytes before encoding the network’s learnt parameter set θ into ourbinary format.Our homogeneous Neural Implicit format allows for a singlequery implementation regardless of network architecture used in theoverfitting process. The fixed storage profiles and memory layout ofour learnt implicit functions provide consistent query and renderingspeeds. Once trained, our
Neural Implicit format can be treated asany other first class implicit representation.
Our
Neural Implicit representation can be treated as its classicalcounter part (SDF) and rendered efficiently using ray marching. Raymarching [Har96] is a common technique for rendering implicitfields where rays are initialized in the image plane and iteratively"marched" along each ray by a step size equal to the signed dis-tance function value at the point. A single ray is marched until itis sufficiently close to the surface ( ε VERFIT
SDF. We initialize the starting position of eachray to be it’s first intersection with the unit sphere, since all
NeuralImplicit s are normalized to lay within. Rays that do not intersectthe unit sphere are pruned from the set before marching begins. Asrays of the image converge at different times, we employ a dynamicbatching scheme that composes batches of points for inference basedon a mask buffer which tracks rays that have converged to the surfaceor reached the maximum number of steps. Additionally, the dynamicbatching method allows us to append additional points when surfacenormals are required for shading.
3. Implementation and Results
Our
Neural Implicit representation can be used in applications asif it were the true signed distance field of the shape. Here, wedemonstrate O
VERFIT
SDF’s ability to learn a shape’s SDF functionfor fast and efficient rendering and as a compressed representationof the original geometry.We implement O
VERFIT
SDF networks in Tensorflow [MAP*15]while point sampling and mesh processing are implemented in libigl[JPS*16]. We train our model for a maximum of 100 epochs andallow early stopping for geometries that converge quickly. We usethe ADAM optimizer [KB14] with a fixed learning rate of 10 − .These settings generalized well across a wide range of geometries(see Fig. 12 and Fig. 15). © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation Figure 10:
Neural Implicit s admit many trivial interactive manip-ulations, the models predicted distance’s can be modified throughboolean operations similar to any implicit field. See accompanyingvideo for animation. θ = 0° θ = 5° θ = 10°DeepSDFOurs Figure 11: DeepSDF [PFS*19] reconstruction quality degradesquickly for geometries not aligned to default orientation per class.Our method converges to the same quality regardless of orientation.See accompanying video for animation.
We implement our renderer in CUDA as kernels for sphere marching,fragment shading and batch preparation. We use the CUTLASS[KMDT18] CUDA linear algebra library for fast and efficient stridedGEMM (general matrix multiplication) calculation, required forinference against O
VERFIT
SDF models.We achieve an average frame rate of
34 FPS when renderingat 512 by 512 resolution on an Nvidia P100 GPU across a subsetof the Thingi10k dataset. Although not acceptable for real-timerendering applications, this result is a significant improvement overprevious learnt implicit rendering pipelines. In [LZP*19] renderingis optimized to 1 FPS by overstepping along all rays by a factorof 50%, increasing the convergence criteria, and implementing acoarse to fine strategy. These optimizations could further improveour rendering speed but would reduce the overall quality of therenders and were not integrated. )As our representation is a learnt representation of the SDF weinherit all the benefits of traditional implicit functions. Our
NeuralImplicit can be smoothly interpolated in implicit space, and can beinteractively modified with constructive solid geometry operationsshown in Figure 10.
Training deep neural networks on large geometric datasets has clas-sically been a cumbersome, time consuming task. For our
NeuralImplicit representation to be effective, any geometry must be able tobe converted in a reasonable amount of time. Due to the minimalcomplexity of our base configuration (8 layers of 32 neurons) wefind that we can overfit our model to any geometry in an average of Figure 12: Thingi10k models [ZJ16] compressed to as NeuralImplicits . Our representation gracefully scales to high-quality recon-structions as its footprint increases, using an order of magnitude lessmemory than alternative representations at equal quality.
Original NeuralImplicit
Figure 13: Our
Neural Implicit format encodes sharp edges with ahigh degree of accuracy.90s. Additionally, with only 64kB of memory required, many mod-els can be trained concurrently on modern GPUs without nearingthe memory limits. Converting the entirety of the 10,000 models inthe Thingi10k dataset [ZJ16] on an Nvidia Titan RTX took just 16hours on a single GPU or 4 hours when trained in parallel across 4Nvidia Titan RTX cards.Conversion of the Thingi10k dataset from mesh format to
NeuralImplicit format reduces the overall storage impact from 38.85 GBto just 640 MB. Comparatively if DeepSDF [PFS*19] could betrained on the same dataset, all geometries would be compressedto just 7 MB. Unfortunately, through our experiments we foundthat DeepSDF results depend on (a) all geometries belonging to asingle class (b) all geometries being consistently oriented and (c)the number of geometries in the dataset. We can see in Figure 11that rotating the input mesh by just a few degrees results in dras-tic reconstruction errors, while O
VERFIT
SDF produces consistentresults regardless of orientation. With this experiment, we demon-strate why DeepSDF failed to converge to reasonable results on theThingi10k dataset, since geometries are arbitrarily oriented and ofno specific class. Additionally, even if DeepSDF did not suffer fromthe aforementioned problems training on the full 10,000 geometrydataset would be intractable due to the memory required and storingand optimizing the 10,000 shape embeddings (without significantlyreducing embedding size, network complexity, or the dataset beingconverted). © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation
As many of the geometries in Thingi10k dataset are organic"smooth" shapes, we also verify that our method is capable of main-taining sharp edges in reconstructions. We find that our networkis able to recreate sharp edges (Fig. 13 & 16) with a high level ofaccuracy, despite not being specifically biased to do so.In our conversion of the Thingi10k dataset, the architecture wasfixed, yielding an efficient and constant memory representation.However, if surface reconstruction quality is a priority, the focus caninstead shift to an error driven surface fitting (similar too classicalapproaches [OBA*05]), scaling network complexity according tothe input geometry. As each O
VERFIT
SDF produced encodes itsown architecture, this change will result in simple geometries beingencoded into minimal parameter configurations (base: 7553) whiletopologically complex geometries can be represented by higherresolution configurations. The effect of this error driven optimizationapproach can be seen in Figure 16, where a simple grid searchwas performed until surface error met a target goal. Based on ourconversion of the Thingi10k dataset, we find that majority of modelsare represented well by our base configuration (Fig. 15), wherethe tails of the error distribution could be retrained with additionalcomplexity until some target surface error is achieved.
All images in Figure 12 were rendered from
Neural Implicit gen-erated from our base configuration, resulting in a total parameterset of 7553 tuned to each shape’s implicit function. At just 64 kBof memory we find that our lightweight representation can capturecomplex topologies with relatively high resolution compared to uni-form signed distance grids or decimated mesh if similar memoryfootprints. For the comparison in Figure 14, each geometry wasconverted to a
Neural Implicit in our base configuration of 7553 pa-rameters, and is visualized beside the rendered result of a uniformlysampled SDF grid with 20 samples, along with the original meshadaptively decimated [GH97] down to contain only 7600 floats.Compared to decimated meshes (baseline non-uniform format), weobserve that our homogeneous format has similar surface quality.Compared to SDFs stored on a grid (baseline uniform format) weobserve far better quality. Additionally, we see that our approachbetter captures high frequency surface detail than both representa-tions, often producing results that more closely match the "style" ofthe original shape.We quantify our methods robustness by converting the entirety ofthe Thingi10k [ZJ16] dataset to our Neural Implicit format, measur-ing the average surface error (Eqn. 7) and training loss. The trainingloss reported is the mean of errors between the true and predictedSDF values at points sampled using our importance metric outlinedin Section 2.1.2. The surface error is the sum of errors at pointsalong the shape’s 0-isocontour,Surface Error = N N ∑ i = | f θ ( p i ) | (7)We use these simple metrics in conjunction for their ability to notonly measure error at the surface but also error within the boundingvolume. Errors within the bounding volume manifest as increasedrendering times or holes in the model during rendering, while surfaceerrors are clear when marching cubes over the learnt SDF field. Figure 14: Our learnt Neural Implicit format (right) can be shownto better approximate the original surface (grey, inset) comparedto adaptive decimation of the original triangle mesh [GH97] (left)and uniform signed distance grid (middle) with equal memory im-pact. gpvillamil (skull), Makerbot (whale), morenaP (frog), artec3d(dragon), JuliaTruchsess (octopus) under CC BY.We sample 100,000 surface points for measuring average surfaceerror, and assess converged loss against the training set of 1 M points.The results on the entirety of the Thingi10k dataset are visualized inFigure 15. We find that at this configuration that 93% of the 10,000geometries in the diverse Thingi10k dataset converge to a surfaceerror below 0.003, with no model exceeding 0.01 (worst case, of0.0097 shown in Fig. 16).
4. Limitations and Future Work
In some cases, when a mesh is of exceptional topological , we findthat our base configuration of only 7553, simply cannot representthe highly non-linear implicit field. As shown in Figure 16, highlycomplex geometries cannot be represented well by our limited reso-lution network, similarly the geometry cannot by a mesh decimatedto equivalent memory footprint. These cases are simple to detect © 2020 The Author1(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation
Surface ErrorTraining Loss
Figure 15: Loss and surface error distributions over the entirety ofthe Thingi10k [ZJ16] dataset.
Original
Lorem ipsum
Decimated Base Increased
Figure 16: With only 7553 parameters, our base Neural Implicit format can lack the representative power to converge on highlycomplex geometries (similar to decimated mesh with same memoryfootprint). Increasing the network capacity to equal the memoryimpact of the original mesh results in near perfect reconstructions.tbuser (left) under CC BY.by analyzing the surface error during training, and can easily berectified by increasing the network resolution.Neural networks have proven to be effective approximators ofsigned distance fields (SDFs) for solid 3D objects. Existing meth-ods have seen success for generative modelling, shape interpola-tion, surface reconstruction, and differentiable rendering. Howevercompelling, these methods share a common flaw, they depend onsemantically consistent geometries in a fixed orientation. Instead offocusing training a model to generalize across some small subsetof category, we overfit networks to single geometries applying thefull network capacity to the reconstruction quality of a single shape.This not only allows for high quality reconstructions with minimalmemory, but also supports any arbitrary geometry regardless oforientation or class.We set out to show that overfitting a neural network to a singlesigned distance field is not only feasible, but can be useful in pro-viding an efficient compact representation of the shape. We showedhow our
Neural Implicit representation can effectively capture ashapes signed distance function while maintaining a low memoryimpact and fast inference speeds. Our O
VERFIT
SDF networks area shape representation that inherit the effective infinite resolutionof continuous implicits but with the computational efficiency ofcoarse meshes and the uniform memory patterns of an SDF grid. Wehope that our geometrically principled approach to sampling train-ing points and signing distances lead to more robust and extensibleresearch in geometric deep learning.
References [AL19] A
TZMON , M
ATAN and L
IPMAN , Y
ARON . “Sal: Sign agnostic learn-ing of shapes from raw data”. arXiv preprint arXiv:1911.10414 (2019) 2. [BBB*97] B
LOOMENTHAL , J
ULES , B
AJAJ , C
HANDRAJIT , B
LINN , J IM ,et al. Introduction to implicit surfaces . Morgan Kaufmann, 1997 1.[BBL*17] B
RONSTEIN , M
ICHAEL
M., B
RUNA E STRACH , J
OAN , L E C UN ,Y ANN , et al. “Geometric Deep Learning: Going beyond Euclidean data”.English (US).
IEEE Signal Processing Magazine
ISSN : 1053-5888.
DOI : ARILL , G
AVIN , D
ICKSON , N
EIL , S
CHMIDT , R
YAN , et al. “FastWinding Numbers for Soups and Clouds”.
ACM Transactions on Graphics (2018) 2, 5.[CFG*15] C
HANG , A
NGEL
X, F
UNKHOUSER , T
HOMAS , G
UIBAS ,L EONIDAS , et al. “Shapenet: An information-rich 3d model repository”. arXiv preprint arXiv:1512.03012 (2015) 2.[CZ19] C
HEN , Z
HIQIN and Z
HANG , H AO . “Learning Implicit Fields forGenerative Shape Modeling”. Proceedings of IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR) (2019) 4.[GH97] G
ARLAND , M
ICHAEL and H
ECKBERT , P
AUL
S. “Surface sim-plification using quadric error metrics”.
Proceedings of the 24th an-nual conference on Computer graphics and interactive techniques . ACMPress/Addison-Wesley Publishing Co. 1997, 209–216 7.[Har96] H
ART , J
OHN
C. “Sphere tracing: A geometric method for theantialiased ray tracing of implicit surfaces”.
The Visual Computer
ANOCKA , R
ANA , H
ERTZ , A
MIR , F
ISH , N OA , et al.“MeshCNN: a network with an edge”. ACM Transactions on Graphics(TOG)
UANG , J
INGWEI , S U , H AO , and G UIBAS , L
EONIDAS
J. “RobustWatertight Manifold Surface Generation Method for ShapeNet Models”.
CoRR abs/1802.01698 (2018) 2.[HSW89] H
ORNIK , K
URT , S
TINCHCOMBE , M
AXWELL , and W
HITE , H AL - BERT . “Multilayer feedforward networks are universal approximators”.
Neural networks
IANG , Y UE , J I , D ANTONG , H AN , Z HIZHONG , and Z
WICKER ,M ATTHIAS . SDFDiff: Differentiable Rendering of Signed Distance Fieldsfor 3D Shape Optimization . 2019. arXiv:
ACOBSON , A
LEC , K
AVAN , L
ADISLAV , and S
ORKINE -H ORNUNG , O
LGA . “Robust inside-outside segmentation usinggeneralized winding numbers”.
ACM Transactions on Graphics (TOG)
ACOBSON , A
LEC , P
ANOZZO , D
ANIELE , S
CHÜLLER , C, et al. libigl: A simple C++ geometry processing library . 2016 5.[KB14] K
INGMA , D
IEDERIK
P and B A , J IMMY . “Adam: A method forstochastic optimization”. arXiv preprint arXiv:1412.6980 (2014) 5.[KMDT18] K
ERR , A
NDREW , M
ERRILL , D
UANE , D
EMOUTH , J
ULIEN , andT
RAN , J
OHN . CUTLASS: Fast Linear Algebra in CUDA C . Sept. 2018.
URL : https://devblogs.nvidia.com/cutlass- linear-algebra-cuda/ ITTWIN , G
IDI and W
OLF , L
IOR . “Deep Meta Functionals forShape Representation”.
CoRR abs/1908.06277 (2019). arXiv: . URL : http://arxiv.org/abs/1908.06277
2, 3.[LYF17] L IU , J ERRY , Y U , F ISHER , and F
UNKHOUSER , T
HOMAS
A. “In-teractive 3D Modeling with a Generative Adversarial Network”.
Proc.3DV . 2017, 126–134 2.[LZP*19] L IU , S HAOHUI , Z
HANG , Y
INDA , P
ENG , S
ONGYOU , et al.
DIST:Rendering Deep Implicit Signed Distance Function with DifferentiableSphere Tracing . 2019. arXiv:
2, 6.[MAP*15] M
ARTIN A BADI , A
SHISH A GARWAL , P
AUL B ARHAM , etal.
TensorFlow: Large-Scale Machine Learning on Heterogeneous Sys-tems . Software available from tensorflow.org. 2015.
URL : http : / /tensorflow.org/ ESCHEDER , L
ARS , O
ECHSLE , M
ICHAEL , N
IEMEYER ,M ICHAEL , et al. “Occupancy Networks: Learning 3D Reconstructionin Function Space”.
Proceedings IEEE Conf. on Computer Vision andPattern Recognition (CVPR) . 2019 2, 5. © 2020 The Author(s) . Davies & D. Nowrouzezahrai & A. Jacobson / Overfit Neural Networks as a Compact Shape Representation [MS15] M
ATURANA , D
ANIEL and S
CHERER , S
EBASTIAN . “VoxNet: A 3DConvolutional Neural Network for real-time object recognition”.
Ieee/rsjInternational Conference on Intelligent Robots and Systems . 2015, 922–928 2.[NLBY18] N
GUYEN -P HUOC , T HU , L I , C HUAN , B
ALABAN , S
TEPHEN ,and Y
ANG , Y
ONG -L IANG . “RenderNet: A deep convolutional networkfor differentiable rendering from 3D shapes”.
Proc. NeurIPS . 2018, 7902–7912 2.[NMOG19] N
IEMEYER , M
ICHAEL , M
ESCHEDER , L
ARS , O
ECHSLE ,M ICHAEL , and G
EIGER , A
NDREAS . Differentiable Volumetric Render-ing: Learning Implicit 3D Representations without 3D Supervision . 2019.arXiv:
HTAKE , Y
UTAKA , B
ELYAEV , A
LEXANDER , A
LEXA , M
ARC ,et al. “Multi-level partition of unity implicits”.
Acm Siggraph 2005Courses . 2005, 173–es 7.[PFS*19] P
ARK , J
EONG J OON , F
LORENCE , P
ETER , S
TRAUB , J
ULIAN ,et al. “DeepSDF: Learning Continuous Signed Distance Functions forShape Representation”.
The IEEE Conference on Computer Vision andPattern Recognition (CVPR) . June 2019 1–6.[QSMG16] Q I , C HARLES
R, S U , H AO , M O , K AICHUN , and G
UIBAS ,L EONIDAS
J. “PointNet: Deep Learning on Point Sets for 3D Classifica-tion and Segmentation”. arXiv preprint arXiv:1612.00593 (2016) 2.[WKBS19] W
ANG , Y U , K IM , V LADIMIR
G., B
RONSTEIN , M
ICHAEL , andS
OLOMON , J
USTIN . “Learning Geometric Operators on Meshes”.
ICLRWorkshop on Representation Learning on Graphs and Manifolds (2019) 2.[WSK*15] W U , Z HIRONG , S
ONG , S
HURAN , K
HOSLA , A
DITYA , et al.“3D ShapeNets: A Deep Representation for Volumetric Shape Modeling”.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .Boston, USA, June 2015 2.[WSLT18] W
ANG , P
ENG -S HUAI , S UN , C HUN -Y U , L IU , Y ANG , andT
ONG , X IN . “Adaptive O-CNN: A Patch-based Deep Representationof 3D Shapes”. ACM Transactions on Graphics (SIGGRAPH Asia) U , J IAJUN , W
ANG , Y
IFAN , X UE , T IANFAN , et al. “Marr-Net: 3D Shape Reconstruction via 2.5D Sketches”.
Advances In NeuralInformation Processing Systems . 2017 2.[ZJ16] Z
HOU , Q
INGNAN and J
ACOBSON , A
LEC . “Thingi10k: A dataset of10,000 3d-printing models”. arXiv preprint arXiv:1605.04797 (2016) 3,6–8.(2016) 3,6–8.