OMiCroN -- Oblique Multipass Hierarchy Creation while Navigating
CComputers & Graphics (2020)
Contents lists available at ScienceDirect
Computers & Graphics
OMiCroN - Oblique Multipass Hierarchy Creation while Navigating
Vin´ıcius da Silva a, ∗ , Claudio Esperanc¸a b , Ricardo Marroquim b,c a Institute for Pure and Applied Mathematics (IMPA). VISGRAF Lab. Estrada Dona Castorina, 110, Jardim Botˆanico, Rio de Janeiro - RJ, Brazil, CEP: 22460-320 b Federal University of Rio de Janeiro (UFRJ). Computer Graphics Lab (LCG). Cidade Universit´aria, Centro de Tecnologia, Block H, Rio de Janeiro - RJ, Brazil,CEP: 21941-972 c Delft University of Technology (TU Delft). Computer Graphics and Visualization Group. Van Mourik Broekmanweg 6, Delft, The Netherlands
A R T I C L E I N F O
Article history :Received June 25, 2020
Keywords: 2010 MSC:
A B S T R A C TRendering large point clouds ordinarily requires building a hierarchical data structurefor accessing the points that best represent the object for a given viewing frustum andlevel-of-detail. The building of such data structures frequently represents a large por-tion of the cost of the rendering pipeline both in terms of time and space complexity,especially when rendering is done for inspection purposes only. This problem has beenaddressed in the past by incremental construction approaches, but these either result inlow quality hierarchies or in longer construction times. In this work we present OMi-CroN – Oblique Multipass Hierarchy Creation while Navigating – which is the firstalgorithm capable of immediately displaying partial renders of the geometry, providedthe cloud is made available sorted in Morton order. OMiCroN is fast, being capable ofbuilding the entire data structure in memory spending an amount of time that is com-parable to that of just reading the cloud from disk. Thus, there is no need for storingan expensive hierarchy, nor for delaying the rendering until the whole hierarchy is readfrom disk. In fact, a pipeline coupling OMiCroN with an incremental sorting algorithmrunning in parallel can start rendering as soon as the first sorted prefix is produced,making this setup very convenient for streamed viewing.c (cid:13)
1. Introduction
In recent years, improvements in acquisition devices andtechniques have led to the creation of huge point cloud datasets.Direct rendering of such datasets must resort to indexing datastructures. These are used for culling portions of the model out-side the viewing frustum and for selecting representative pointsubsets for the portions inside it. In many use cases, the costof building such structures is not critical, since the resultinghierarchy is stored in secondary memory so it can be reusedevery time a render session starts. Thus, research focusing onthe quality of the render need not justify arbitrarily long prepro- ∗ Corresponding author: Tel.: + e-mail: [email protected] (Vin´ıcius da Silva) cessing times (e.g. [1, 2]). In other cases, shortening the time toproduce the hierarchy is deemed worthwhile, at the expense ofachieving slightly worse balance or render quality. This is par-ticularly useful for applications that must render the point cloudand perform additional tasks or that must handle dynamic data(e.g. collision detection [3]). No published research, to the bestof our knowledge, has yet reported a means for rendering pointclouds before the hierarchy is completely available. Bottom-up hierarchy building.
Strategies for building pointcloud hierarchies can be divided into three classes: incremen-tal, top-down and bottom-up. Incremental strategies consist ofsequentially inserting points into an incomplete hierarchy. Themain limitation of this strategy is that the quality of results areultimately dependent on the insertion order [4, 5]. Better qual-ity hierarchies require examining the whole data beforehand. a r X i v : . [ c s . G R ] J un Preprint Submitted for review / Computers & Graphics (2020)
Top-down strategies work by partitioning the input in increas-ingly smaller groups. On the other hand, bottom-up strategiesjoin small collections of close points into increasingly largergroups. One simple way of producing a sequence of points thatin general lie close to each other is to sort them according tosome 3D space-filling curve such as that defined by the
Mortonorder , used to organize nodes in octrees.
OMiCroN.
In this paper we introduce OMiCroN (ObliqueMultipass Hierarchy Creation while Navigating), a new take onthe problem of shortening the delay between point cloud acqui-sition and its visualization. Its central idea – and main contribu-tion – is to build the hierarchy in memory while allowing a syn-chronous inspection of all data already loaded. It is importantto stress that performing both tasks in parallel involves solvingnon-trivial synchronization issues. OMiCroN circumvents mostof these by combining Morton code ordering, bottom-up con-struction and the concept of oblique cuts , where the renderableparts of the model are clearly separated from the non-renderableparts by a single delimiting Morton code.
Use cases.
The fact that OMiCroN only requires that the inputpoint cloud be ordered by Morton Code allows it to be deployedin several ways. For instance, a pipeline can be built where anunordered point cloud is fed into an incremental sorter and, assoon as the sorted points are produced, they are fed into OMi-CroN. Alternatively, one can use a batch sorter to produce a filecontaining the ordered points file, which is later read by OMi-CroN. In both cases, the hierarchy produced by OMiCroN canbe stored as a file for later reuse (see Figure 1).
Unsortedpoints Mortoncode sortSortedpoints OMiCroN Display
Hierarchy
OMiCroNviewer
Fig. 1: The standard OMiCroN pipeline (arrows labeled ) permits inspectinga raw point cloud where the first images are produced just after an incrementalMorton code sorter outputs the first points. If the sorted points are already avail-able, OMiCrON starts rendering immediately (arrows labeled ). The hierarchycomputed by OMiCroN can be flushed to disk for later reuse. This is the tradi-tional pipeline (arrows labeled ), where rendering starts after the hierarchy isbuilt. Contributions.
The technical contributions of this work are: • introduces the concept of Hierarchy Oblique Cuts, that al-lows parallel data sorting, spatial hierarchy constructionand rendering; • restricts the preprocessing of input data to a very fast andflexible Morton code based partial sort; • allows for on-the-fly Octree construction for large pointclouds; • following the Morton Order, renders full detail data fromthe very beginning as a consequence of bottom-up hierar-chy construction; • provides immediate visual feedback of the hierarchy cre-ation process.This paper is organized in the following manner. In Section 2we present the necessary background for describing OMiCroN.In Section 3 the related work is presented. In Section 4 wegive an overview of our method, while the two central conceptsof Hierarchy Oblique Cuts and Oblique Hierarchy Cut Frontsare described in details in Sections 5 and 6, respectively. InSection 7 we present the parallel version of the OMiCroN algo-rithm, describing a proof-of-concept application for processingand rendering large point clouds. In Section 8 we describe theexperiments to measure the preprocessing, rendering and mem-ory e ffi ciency of the algorithm. Finally, conclusions, limitationsand future work directions are presented in Section 9.
2. Background
Our work depends on three major concepts: Morton Order;Hierarchical Spatial Data Structures; and Rendering Fronts.The theory behind them is summarized in this section.
Morton Order.
Morton [6] proposed a linearization of 2Dgrids, later generalized to n-dimensional grids. It results in az-shaped space-filling curve, called the Z-order curve. The or-der in which the grid cells are visited by following this curve iscalled Morton order or Z-order. The associated Morton code foreach cell can be computed directly from the grid coordinates byinterleaving their bits. Figure 2 illustrates the concepts above.
Spatial Data Structures.
Morton codes extend naturally to reg-ular spatial subdivision schemes, thus they are usually used inconjunction with hierarchical spatial data structures such as Oc-trees and regular Kd-trees (Bintrees). They provide fast dataculling and a direct level-of-detail structure, by mapping then-dimensional structure to a one-dimensional list. Figure 2 il-lustrates an Octree with an embedded Morton code curve, andits associated hierarchical representation.
Rendering Front.
A Rendering Front, hence called only Front,is a structure to optimize sequential traversals of hierarchies,and has been used in many works [7, 8, 9, 10]. This techniqueexplores spatial and temporal locality. Instead of starting thetraversal at the root node for every new frame, it starts at thenodes where it stopped in the preceding frame. Fronts havetwo basic operators: prune and branch . The prune operatortraverses the hierarchy up, removing a group of sibling nodesfrom the front and inserting their parent. The branch operatorworks in the opposite direction, by removing a node from thefront and inserting its children. Figure 3 depicts a front and thetwo operators. reprint Submitted for review / Computers & Graphics (2020) 3(a) Z-order curve. (b) Relationship between Morton orderand grid coordinates.(c) Morton order and associated hierarchical representation. Order is indicatedinside nodes, coordinates and Morton codes outside them. The Morton code forthe n -th child of a parent node with code x is x concatenated with the binary (bit-interleaving) representation of n . Coordinate values and interleaved bits sharecolor. Parent code is between curly brackets and node index between squarebrackets. A prefix bit is used to avoid ambiguity.Fig. 2: Z-Order and Morton code illustrative example.(a) A front and operations to beperformed. (b) The front after the prune and branch operations.Fig. 3: Rendering Front example.
3. Related work
While the use of points as rendering primitives was in-troduced very early in Computer Graphics [11, 12], theirwidespread adoption only occurred much later, as discussed onextensive survey literature [13, 14, 15, 16, 17, 18]. Many algo-rithms were presented from that period on, proposing improvedimage quality by changes in the kernel logic, better spatial man-agement by the use of multiresolution and LOD structures, andintegration of the out-of-core paradigm, resulting in systemsthat can handle extremelly large point clouds. Here we focusthe discussion on multiresolution and LOD structures, estab-lishing an argument for why a stream-and-feedback-based al-gorithm such as OMiCroN is a desirable tool for the academyand industry.QSplat [1] is the seminal reference on large point cloud ren- dering. It is based on an out-of-core hierarchy of boundingspheres, which is traversed to render the points. Since its mainlimitation is the extensive CPU usage, QSplat was followed byworks focused on loading more work onto the GPU. For ex-ample, Sequential Point Trees [19] introduced adaptive render-ing completely on the graphics card by defining an octree lin-earization that can be traversed e ffi ciently using the GPU ar-chitecture. Other methods used approaches relying on the out-of-core paradigm, such as XSplat [20] and Instant Points [2].XSplat proposed a paginated multiresolution point-octree hier-archy with virtual memory mapping, while Instant Points ex-tended Sequential Point Trees by nesting linearized octrees todefine an out-of-core system. Layered Point Clouds [21] pro-posed a binary tree of precomputed object-space point cloudblocks that is traversed to adapt sample densities according tothe projected size in the image. Wand et al. [22] presented anout-of-core octree-based renderer capable of editing large pointclouds and Bettio et al. [23] implemented a kd-tree-based sys-tem for network distribution, exploration and linkage of mul-timedia layers in large point clouds. Other works focused onparallelism using multiple machines to speed-up large modelprocessing or to render on wall displays using triangles, points,or both [24, 25, 26, 27, 28].More recently, relatively few works have focused on fur-ther improving the rendering of large point clouds, such asthe method by Lukac et al. [29]. Instead, more e ff ort hasbeen concentrated on using established techniques in domainsthat require the visualization of large datasets as a tool forother purposes. For example, city visualization using aerial LI-DAR [30, 31], sonar data visualization [32] and, more promi-nently, virtual reality [33, 34, 35, 36].An important discussion concerns which approach best ex-ploits parallelism when creating a hierarchy. A good way toaddress this question is to study GPU algorithms, which mustrely on smart problem modeling to achieve maximum degree ofdata independency, increasing throughput in a GPU manycoreenvironment. Karras [37] made an in-depth discussion aboutthis subject. His major criticism of other methods is that top-down approaches achieve a low degree of parallelism at the toplevels of the tree, generating underutilization of processing re-sources at early stages of hierarchy construction. Bottom-upmethods do not su ff er from this problem because the number ofnodes grows exponentially with the hierarchy depth, providingsu ffi cient data independency and a good degree of parallelism.While the aforementioned papers present very useful andclever methods to implement or use large point cloud render-ing, none of them considers presenting data to the user beforethe full hierarchy is created. For example, implementors of sys-tems that use large point cloud rendering as a tool could use thevisual feedback given by the algorithm in order to check if thedata is presented properly, without having to wait for the fullhierarchy to be available. Additionally, in environments wheredata transfer is a bottleneck, the input data could be transferedand the hierarchy constructed on-the-fly, instead of transferringthe full hierarchy which may be several times larger. Preprint Submitted for review / Computers & Graphics (2020)(a) Initial (possibly empty) ren-derable hierarchy and concatenate operator. (b) The fix operator: node ances-tors are inserted into the hierarchy. (c) After the fix operation the ren-derable hierarchy is expanded.Fig. 4: OMiCroN overview. A renderable hierarchy is maintained while inserting incoming nodes in parallel. This cycle is repeated until the whole hierarchy isconstructed.
4. Overview
Rendering a hierarchy while it is under construction is a non-trivial synchronization problem. Since a rendering front canpotentially have access to any node in the hierarchy, the useof locks might lead to prohibitive performance. On one hand,using big critical sections by mutexing whole hierarchy lev-els result in excessive serialization and bad performance. Onthe other hand, the use of smaller critical sections by mutex-ing nodes or sibling groups, result in a huge memory overheadto maintain lock data. To e ffi ciently address this problem, oneshould have a strong definition of what is already processed andis renderable and what is under construction and still volatile.We propose to synchronize those tasks using specific Mor-ton Curve and Morton Code properties to classify nodes in allcurves composing a hierarchy. This classification is based onan Oblique Hierarchy Cut, a novel data-structure to representhierarchies under construction. Nodes inside an Oblique Cutare guaranteed to be rendered without interference of the con-struction and vice-versa. An overview of the idea can be seen inFigure 4. It also shows how new nodes are created and insertedusing two operators: concatenate and fix . Starting from an ini-tial (possibly empty) renderable hierarchy, nodes from the max-imum level are inserted using the concatenate operator. Then,the hierarchy is evaluated in a bottom-up manner, inserting an-cestors of the concatenated nodes into the renderable hierarchyusing the fix operator.To evaluate if a node is inside an Oblique Cut we need amethodology that is consistent for all curves at di ff erent hierar-chy levels. One that makes sense is to consider a node insidethe cut if all of its descendants are also inside it. Thus, we needa proper way to relate nodes at Morton Curves at di ff erent levelsof the hierarchy. For that purpose, let span ( x ) be a function thatreturns the Morton Code of the right-most descendant of a sup-posedly full subtree rooted by x . With this definition span hasseveral useful properties. First, it conceptually maps nodes inany hierarchy level with other ones at the deepest level. Thus, italso maps any Morton Curve to the Morton Curve at that level.Not only this, but by definition span ( y ) < = span ( x ), for anydescendant y of x . The cut is then defined as a value m C at thedeepest level and span is used to map any node to its right- most descendant at that level and query if it is left (inside) orright (outside) of the cut. This operation is really e ffi cient be-cause, given the Morton Code of a node, calculating the MortonCode of its right-most descendant is equivalent to concatenat-ing a su ffi x of bits 1 to that value. Figure 5 shows how span works. span(0) = 7 span(1) = 15 span(root) = 63span(57) = 57mC = 7 Fig. 5: span . In the example, the cut is defined by the delimiting Morton Code m C =
7, defined at the deepest level. Each pair of colored squares showsthe input and result of span . The blue square case is inside the cut because span (0) = < =
7. The other cases (red, green and black) are outside of the cutbecause span ( x ) >
7. It is important to note that the operation is defined for anylevel of the hierarchy, even for nodes at the deepest level, where span ( x ) = x .
5. Oblique Hierarchy Cuts
In this section we describe the Oblique Cuts in detail. Givena conceptual expected hierarchy H , with depth l max , an ObliqueHierarchy Cut C consists of a delimiting Morton code m C anda set of lists L C = { L C , k , L C , k + ... L C , l max } , where k is the shallow-est level of the hierarchy present in the cut. Each node N isuniquely identified by its Morton code m N and these two con-cepts are interchangeable from now on. C also has the followingimportant invariants (see Figure 6):1.1 m C has level l max .1.2 L C , l contains subtrees of L C rooted by nodes at level l .1.3 All subtrees in L C are disjoint.1.4 L C , l is always sorted in Morton order. reprint Submitted for review / Computers & Graphics (2020) 5Fig. 6: Oblique Hierarchy Cut and operators concatenate and fix . A cut C is defined by a delimiting morton code m C and a list of roots per level L C (a). The green color represent nodes already created and inside the cut. Thered color indicates nodes not created yet, which exist only in the conceptualexpected hierarchy H . The concatenate operator inserts new roots x and x at the deepest level l max , resulting in cut C (cid:48) (b). Then, operator fix traversessubtrees bottom-up, creating parents until the boundary S is reached. N with span ( m N ) ≤ m C are in one of the sub-trees in L C .We now formally define the two operators, concatenate and fix , as well as the important concept of Placeholder nodes.
The operator concatenate is defined as C (cid:48) = concatenate ( C , { x , ..., x n } ) with m C < x < ... < x n . This operator incorporates new l max level leaf nodes { x , ..., x n } to C , resulting in a new cut C (cid:48) . The operator itself is simpleand consists of concatenating all new nodes into list L C , l max ,resulting in L C (cid:48) , l max . This operator is illustrated in Figure 6.In order for C (cid:48) to be an Oblique Hierarchy Cut, all invari-ants must hold. Invariant 1.1 can be maintained by letting m C (cid:48) = x n . Invariant 1.2 holds by the definition of concatenate ,since the insertion of the leaf nodes occurs at the correct list L C , l max at level l max . Invariants 1.3 and 1.4 are ensured by thefact that m C < x < ... < x n , also established in the definitionof concatenate . Invariant 1.5, however, does not hold, sincesome of the ancestors A x of the new nodes { x , ..., x n } may have m C < span ( A x ) ≤ m C (cid:48) , but are not in any subtree of L C (cid:48) afterconcatenation. In fact, it would be absurd if they were, since allnodes N C in C have span ( N C ) ≤ m C (invariant 1.5), m C < m C (cid:48) ,and the concatenate operator only inserts nodes greater than m C at level l max . To resolve invariant 1.5, we define the C (cid:48)(cid:48) = f ix ( C (cid:48) ) opera-tor, whose purpose is to insert the o ff ending nodes in subtreesof L C (cid:48) , resulting in L C (cid:48)(cid:48) , while maintaining all other invariantsintact. To achieve this, fix first defines the set of o ff ending nodes A ∗ x as a subset of A x with span ( A ∗ x ) ≤ m C (cid:48) . Second, it identifiesall subtree roots in A ∗ x whose parents are not in A ∗ x . Let S be theset of such parent nodes. To identify these subtrees, the lists areprocessed bottom-up, that is, beginning with L C (cid:48) , l max . For eachlist, its root nodes are visited in Morton order. The evaluationof a list L C (cid:48) , l works in the following manner: identify the sib-ling root nodes in L C (cid:48) , l ; check if their parent is in A ∗ x ; create anew subtree rooted by their parents at level l −
1; and move thesubtrees from L C (cid:48) , l to their respective parent subtrees in L C (cid:48) , l − .Note, however, that if the parent is in S neither the new subtreeis created nor its children subtrees are moved. The resulting L C (cid:48)(cid:48) will have, thus, only subtrees rooted at nodes whose parents arein S .In order to guarantee that fix is robust enough, all invariantsmust be checked for correctness after the operation. Since nonew l max level nodes are inserted by fix , we let m C (cid:48)(cid:48) = m C (cid:48) and invariant 1.1 is ensured. Invariant 1.2 holds because the A ∗ x nodes are inserted in L C (cid:48) at the same level they are in H . Re-garding invariant 1.3, the nodes in A ∗ x are unique and they werenot in C (cid:48) , since the only nodes N C (cid:48) that had span ( N C (cid:48) ) > m C were inserted at level l max by the concatenate operator. Thus,this invariant holds. Since the subtrees inserted by fix are evalu-ated in Morton order, they are also inserted in this order, main-taining invariant 1.4. Lastly, invariant 1.5 is ensured becausethe subtrees inserted by fix are rooted by nodes whose parentsare in S , and S is outside of A ∗ x . Thus, m C (cid:48)(cid:48) < span ( S ) and S forms a node boundary outside cut C (cid:48)(cid:48) . The f ix operator isillustrated in Figure 6. According to the aforementioned definition of Oblique Hi-erarchy Cut, H can only have leaves at level l max , since the concatenate operator only inserts nodes at this level. Leavescould be inserted into other levels directly, but it would make Preprint Submitted for review / Computers & Graphics (2020) it di ffi cult for fix to e ffi ciently maintain invariant 1.4 since thelists L C (cid:48) are independent and evaluated in a bottom-up manner.To address this issue, the concept of placeholder is defined. A placeholder is an empty node at a given level representing anode at a shallower level. More precisely, given a node N atlevel l , its placeholder P N , l + at level l + N . In other words, the Morton code of P N , l + is m N followed by a bitmask of as 3 1’s as demanded bythe degree of the Octree. Note that, with this definition, P N , l max has Morton code span ( m N ), as can be verified by applying theplaceholder definition recursively.A leaf X in H with level l < l max is represented by placeholder P X , i such that l < i ≤ l max when inserting the subtree of level i at L C (cid:48) i . Placeholders are used as roots of degenerate subtrees,since there is no purpose for them inside subtrees. Even if notmeaningful for H , placeholders ensure invariant 1.4 in fix untillevel l is reached. Figure 7 shows the concept of placeholders. Intuitively, a sequence of Oblique Hierarchy Cuts C i result-ing from sequentially applying operators concatenate and fix until no more leaf nodes or placeholders are left for insertionresults in an oblique sweep of H , as can be seen in Figure 7. Toprove this, let C end be the last cut in this sequence. Because ofinvariant 1.5, all nodes N in H with m N ≤ m C end will be in sub-trees in L C end after f ix . Since there are no more placeholders orleaf nodes in level l max , there are no nodes N with m N > m C end and, thus, S is composed only by the null node (parent of H ’sroot node). Since there are no other parents outside the subtreesthat have roots with parents in S , and S has only a single el-ement, L C end is composed by a single subtree, named T . Also, T ’s root has parent equal to the null node. Thus, T = H , asintuitively suspected. Fig. 7: Oblique Hierarchy Cut progression. As operators concatenate and fix are used, the cuts sweep their associated hierarchy H . Placeholders are markedwith a P and the ones used but removed while processing lists bottom-up arealso marked with a red X.
6. Oblique Hierarchy Cut Front
Concomitantly with the building of H with progressiveoblique cuts, a rendering process might be traversing the al-ready processed portions of H with the help of a front (see Fig-ures 4 and 8). Thus, for a given Oblique Hierarchy Cut C , therendering process will adaptively maintain a front F C restrictedto the renderable part of H . In order to ensure proper indepen-dence of F C with respect to C and other important propertiesneeded later, we define two invariants: 2.1 If F C is composed of n nodes, named F C , i , with 1 ≤ i ≤ n ,then span ( F C , ) < ... < span ( F C , i ) < span ( F C , i + ) < ... < span ( F C , n ).2.2 The roots of subtrees in L C cannot enter the Front.Invariant 2.1 ensures that sibling nodes will be adjacent in thefront, which eliminates searches and simplifies the prune oper-ation. Invariant 2.2 is defined because the roots of subtrees in L C are being moved among lists by the fix operator in order tocreate subtrees at other levels and thus are not safe to enter thefront. Note that both invariants impose restrictions on the prune operator in order to ensure that all nodes on the front are rootsof disjoint subtrees and do not include nodes still being pro-cessed. Similarly, placeholders cannot be pruned either sincetheir parents might not yet be defined.In summary, the evaluation of an Oblique Hierarchy CutFront consists of three steps:1. Concatenate new placeholders into the front.2. Choose the hierarchy level l where candidates for substi-tuting placeholders in the front are to be sought.3. Iterate over all front nodes, testing whether they are place-holders that can be substituted, and whether they need tobe pruned, branched or rendered.Leaf insertions and placeholder substitutions will be furtherdescribed in the next sections. The other aspects of opera-tors prune and branch work as usual. All valid inner nodesare reachable by prune operations from the leaves, ensuringproper rendering capabilities for the cut. An example of a validOblique Hierarchy Cut Front is given in Figure 8. Fig. 8: Example of valid Oblique Hierarchy Cut Front. The direction of the bluearrows indicate the order restriction imposed by invariant 2.1. The fact that allnodes in the front are not roots in L C ensures invariant 2.2. Since the root of H is only available after all sequential cutsare evaluated, the usual front initialization is not possible for F C . To insert nodes in the Oblique Hierarchy Cut Front twooperators are used: insertPlaceholder and insertLeaf . In orderto simplify leaf and placeholder insertion and substitution, allleaves are first inserted in the front as placeholders and savedin a per-level list of leaves to be replaced. One main reason forthis duplication is that new nodes are always inserted as roots reprint Submitted for review / Computers & Graphics (2020) 7 in L C , l max , and cannot be in the front due of invariant 2.2. Thus,placeholders mark their position until the fix operator movesthem to other subtrees. The front is, then, continuously checkedto see if placeholders can be replaced by leaf nodes. This sub-stitution is detailed in the next section.The insertPlaceholder operator in its turn is simple since itcan just concatenate placeholders at the end of the front. Thismaintains the invariants since placeholders are available at level l max and they are processed in Morton order by fix . Since the leaf lists are organized by level, and the placehold-ers and leaves are respectively inserted into the front and intothe lists in Morton order, a very simple and e ffi cient substitu-tion scheme is proposed. Given a placeholder and a substitutionlevel l , it consists in verifying if the first element in the leaf listof level l is an ancestor of the placeholder. If it is, the leaf isremoved from the substitution list and replaces the placeholderin the front. Since comparison of Morton codes is a fast O (1)operation, the entire placeholder substitution algorithm is also O (1).Keeping in mind that for each front evaluation a single level l will be checked for substitution, all leaves at level l are guaran-teed to be substituted in a single front evaluation. To verify this,note that if P i and P i + are sequential placeholders at the samelevel and L j and L k are their leaf substitutes, then k = j +
1. Thiscomes again from the fact that all insertion lists and front nodesat a given level are in Morton order and that a leaf and its place-holder have a one-to-one relationship. Thus, if P i is substitutedand, as a consequence L j is removed from the substitution list,then the new first leaf in that list will be L k , resulting in P i + being the next placeholder to be successfully substituted at thatlevel. Consequently, for each placeholder in the front we needonly to verify the first leaf of the list, and after one evaluationthe list for level l will be emptied. In order to maximize node substitution, l is chosen as thelevel with most insertions. This is an obvious choice, since thelist will be completely emptied after the evaluation, so we aresubstituting the maximum number of placeholders in one iter-ation. The nodes not substituted in the current front evaluationare ignored since their corresponding leaves are not in level l .However, the algorithm guarantees that all currently insertedleaves will substitute their placeholders in the next l max −
7. Sample OMiCroN implementation
We have developed a multi-threaded implementation of theOMiCroN algorithm in C ++ where the splat rendering is doneon GPU. The implementation follows the algorithms outlined inthe previous sections, but a few adaptations are necessary withregard to concurrency control.A sorted input stream feeds a master thread that organizesworklists for the current level l . They consist of nodes for the fix operator, which are distributed among the master and slavesthreads for processing. To simplify distribution, the worklistshave fixed size. As a consequence, a sibling group can be splitbetween threads, which might lead to parent node duplication.Thus, after a processing iteration, the master thread checks thefirst and last nodes of the resulting adjacent worklists at level l − L C and their children cannot en-ter the front.Since worklist sizes are expected to become smaller andsmaller as the hierarchy is traversed bottom-up, the masterthread also applies simple load balancing heuristics by merg-ing worklists as they are tested for duplicates. Once level l isprocessed, OMiCroN checks the amount of work available atlevel l −
1. More precisely, it compares the available work atlevel l − l max to verify if it is worth continuing thecurrent f ix pass, or if it is better to start another f ix pass fromscratch.In order to maintain the use of main memory within a givenbudget, it is also possible to enable a very simple optimization,called Leaf Collapse . This optimization removes all leaves atlevel l max which form a chain structure with their parents, i.e.,leaves that do not have siblings.Rendering itself is performed in a separate front trackingthread. This thread is signaled the availability of newly pro-cessed data by the master thread, thus requiring synchroniza-tion. This drawback is minimized by having a di ff erent lock perhierarchy level. Another e ffi ciency tweak consists of segment-ing the front evaluation along several frames in order to amor-tize its cost. A simple rendering approach based on splats [1] isused in our experiments. OMiCroN nodes contain point splatsdefined by a center point and two tangent vectors u and v . Par-ent node creation follows a policy that tries to maintain the ratiobetween the number of points in a parent and its children, wherea parent contains a subset of the splats in its children with scaledtangent vectors.For each frame, the front or front segment is evaluated basedon the projection threshold. If the projection of a given nodeis su ffi ciently large in comparison to the threshold, it su ff ersbranching and its children sibling group is pushed into the ren-dering queue. Conversely, if it is su ffi ciently small, it su ff erspruning and its parent is pushed into the queue. Otherwise, thenode stays in the front to be rendered. The rules for branch-ing and pruning are the ones discussed in Section 6. Finally,the splats in the rendering queue are used as input for the tra-ditional two-pass EWA filter described in [38]. Several meth-ods for computing the sizes of the projected splats were tested[38, 39, 40, 41]. The splat bounding box computation algo-rithm described in [41] resulted in the best performance-qualityrelationship and all results reported in this paper applied it. Preprint Submitted for review / Computers & Graphics (2020)
8. Experiments
The prototype implementation was tested using four pointcloud datasets obtained at the Digital Michelangelo Projectpage: David (469M points, 11.2GB), Atlas (255M points,6.1GB), St. Mathew (187M points, 4.5GB) and Duomo (100Mpoints, 2.4GB). The maximum hierarchy depth was set to 7 toensure memory footprints compatible with available memoryand swap area. Coordinates in all datasets were normalized torange [0 , In order to assess the actual delay from the moment the rawunsorted collection of points is available to the moment whererendering actually starts, we must consider the sorting processat some length. The simplest scenario consists of a separatethread that reads the whole collection, sorts it and streams itto OMiCroN. In this case, OMiCroN must wait at least forthe whole collection to be read by the sorting application, andfor the sort itself. In a more elaborate setup, the sorting pro-cess might start feeding OMiCroN as soon as a prefix of thesorted collection becomes available. These two scenarion arevariations of pipeline 1 in Fig. 1. In order to measure thesegains, we conducted a set of experiments. Our testbed consistsof a desktop computer with an Intel Core i7-3820 processorwith 16GB memory, NVidia GeForce GTX 750 and a SanDisk120GB SSD. The same SSD is used for both swap and I / O.The first experiment consists of consecutively sorting andstreaming chunks of the input to OMiCroN. We use the par-allel IntroSort available in the Standard Template Library(STL) of the C ++ programming language (std::partial sort() orstd::sort()). Parallel rendering and leaf collapse are enabled forthese tests. Since rendering starts as soon as the first sortedchunk becomes available, using more chunks allows renderingto start earlier, as shown in Figure 9. In particular, increasingthe number of sorting chunks can improve the time between themoments input finishes and rendering starts from 5 to 31 times,depending on the size of the dataset. The price of this earlyrendering is that hierarchy creation time may increase up to 4times, also depending on the dataset size. For large datasets, thepartial sort can diminish the use of swap during sort and hier-archy creation, resulting in better timings in all aspects, as Fig-ure 9c demonstrates. We also noted that OMiCroN consumessorted chunks almost as fast as they are produced and streamed,and the hierarchy is finished at most 1 s after the last byte of thesorted stream is read. Another conclusion is that the number ofchunks represents a trade-o ff between the time for starting ren-dering and the total time to sort the dataset. The exception forthis rule is the David dataset.The second experiment consists of profiling and comparingOMiCroN with the parallel rendering activated and deactivatedat hierarchy creation time, also evaluating the system core us-age while running the algorithm. The purpose of this test isto measure the overhead of parallel rendering and the overallusage of resources. The input for this test consists of datasetssorted in Morton order and the data is streamed directly fromdisk (pipeline 2 in Fig. 1). Leaf collapse is disabled. Fig-ure 10 shows the results. The overhead imposed is between Table 1: Relationship between the algorithm reconstruction parameters – leafcollapse, parent to children ratio – and memory footprint, total hierarchy cre-ation times, and average CPU usage per frame.
Model Coll Ratio Mem Creation CPUDavid On 0.2 8.5GB 146.3s 7.6msDavid On 0.25 9.9GB 151.2s 8.8msDavid O ff ff ff ff
20% (David) and 34% (St.Mathew), which is an evidence thatthe overhead impact decreases as the dataset size increases.This is a desirable property for an algorithm designed to han-dle large datasets. The final observation from this experimentis that OMiCroN maintains the usage of all 8 logical cores near90% with peaks of 100% for the entire hierarchy creation pro-cedure, with parallel rendering enabled or disabled. This factjustifies OMiCroN’s fast hierarchy creation times.The third experiment’s purpose is to generate data for bet-ter understanding the hierarchy creation progression over time.It consists of measuring the time needed to achieve percentilemilestones of hierarchy creation. The best scenario is a linearprogression over time so new data can be presented smoothly tothe user while the hierarchy is being constructed. For this test,the sorted data is streamed directly from disk, parallel render-ing is enabled and leaf collapse is disabled unless noted other-wise. The results are presented in Figure 11. We can concludethat the hierarchy construction has the expected linear progres-sion. The exception is the David dataset with leaf collapse dis-abled. This behavior is caused by the hierarchy size, whichexceeds available memory, forcing the use of swap area andperformance degradation. When leaf collapse is enabled, swapis avoided and the behaviour is again linear, as Figure 11 alsodemonstrates.
A second set of experiments were conducted to assess OMi-CroN’s behavior in terms of memory usage and performance.All experiments in this set read a sorted dataset directly fromdisk. The test system had an Intel Core i7-6700 CPU, 16GBmemory, NVidia GeForce GTX 1070 graphics card, and sec-ondary SSD storage with roughly 130 MB / s reading speed.Two main parameters impact OMiCroN’s memory footprint: Leaf Collapse optimization and parent to children point ratio,as shown in Table 1. These also impact the reconstruction qual-ity of the algorithm as can be seen in Figure 12. reprint Submitted for review / Computers & Graphics (2020) 9 T i m e ( m i n ) St. Matthew (a) St. Mathew. T i m e ( m i n ) Atlas (b) Atlas T i m e ( m i n ) David (c) DavidFig. 9: Impact of the number of sort chunks. After a constant amount of time spent reading the input (blue), the first chunk is sorted (red), starting the parallelhierarchy creation and rendering (orange). The first column in all charts corresponds to the case where all input is sorted before the hierarchy creation begins. T i m e ( m i n ) OMiCroN - Hierarchy creation time comparison
Fig. 10: Comparison of hierarchy creation with and without parallel rendering.Sorted data is streamed directly from disk. The overhead imposed by parallelrendering is between 20% (David) and 34% (St. Mathew).
Time C on s t r u c t i on Sorted data
OMiCroN - Time x Hierarchy construction
Fig. 11: Hierarchy creation over time. Sorted data is streamed directly fromdisk, parallel rendering is enabled and leaf collapse is disabled unless pointedotherwise.
Even though limited to datasets that fit in RAM unless swapspace is used, OMiCroN can be set up to fit a broad range ofmemory budgets. For example, David originally occupies 11.2GB in disk, while its maximum size in memory when using
LeafCollapse is 8.5 or 9.9 GB, for parent to children point ratios of0.2 and 0.25 respectively. In this case, a hierarchy with 0.2 ratiohas memory usage of roughly 76% of the original dataset sizein disk. Values smaller than these are possible since reconstruc- tion results shown in Figure 12 are still acceptable. It is alsoimportant to note that the algorithm does not compress in anyway the point or Morton code data. The use of such techniqueswould provide even better memory consumption.Table 1 also shows that the total hierarchy creation times andthe average CPU usage per frame are a ff ected by Leaf Collapse optimization. The CPU times were obtained during a renderingsession where the camera is constantly moving trying to focusthe parts of the model being read from disk. For the Daviddataset, for example, it takes 88.2s to read the data from disk,while OMiCroN imposes an overhead ranging from 0.66 to 1.6in the tested scenarios. We also notice that CPU times are prob-ably a ff ected by Leaf Collapse optimization because the hier-archy is simplified when leaf nodes are removed, resulting insmaller hierarchy fronts.The worklist size is the parameter that controls the workgranularity in the hierarchy creation. In other words, it controlsthe throughput of new nodes available for the hierarchy creationthreads to process. Table 2 shows the relationship between theworklist size and attributes that are expected to be directly af-fected by it. It also shows that the front insertion delay scaleslinearly with the worklist size. As a consequence, larger work-lists impose a longer delay for the user to see new parts of thecloud while navigating. Additionally, the optimal worklist sizeregarding front size is between 32 and 64. Since nodes are pro-cessed in a bottom-up manner and smaller fronts are expectedto have nodes from shallower parts of the hierarchy, setups withsmaller fronts are also expected to have processed more nodesfrom deeper levels than other setups with larger fronts, giventhe same time spent in processing. As a consequence, hierar-chy construction time is reduced in setups with smaller fronts,as Table 2 also indicates. Similarly, benefits in overall perfor-mance of front evaluation are obviously related to smaller frontsizes, resulting in less CPU overhead.
We are also interested in evaluating OMiCroN’s flexibility.To that end, we compared pipelines 2 and 3 from Figure 1, i.e.,constructing the hierarchy on-the-fly from a sorted point streamand reading a previously computed complete hierarchy file. Theexperiments were performed on the same machine as that usedfor the rendering latency tests (Section 8.1), but with more re-cent versions of the dependency libraries and operating system. / Computers & Graphics (2020)(a) David, leaf collapse on, 0.2 point ratio. (b) David, leaf collapse on, 0.25 point ratio. (c) David, leaf collapse o ff , 0.25 point ratio.(d) Atlas, leaf collapse on, 0.2 point ratio. (e) Atlas, leaf collapse on, 0.25 point ratio. (f) Atlas, leaf collapse o ff , 0.25 point ratio.(g) St. Mathew, leaf collapse on, 0.2 point ratio. (h) St. Mathew, leaf collapse on, 0.25 point ratio. (i) St. Mathew, leaf collapse o ff , 0.25 point ratio.(j) Duomo, leaf collapse on, 0.2 point ratio. (k) Duomo, leaf collapse on, 0.25 point ratio. (l) Duomo, leaf collapse o ff , 0.25 point ratio.Fig. 12: Rendering comparison of hierarchies with di ff erent leaf collapse and parent to children point ratio parameters. As can be seen from items (a) to (i), the finalreconstructions are very detailed even at close range and the di ff erences when the leaf collapse is turned on are almost imperceptible for the David, Atlas and St.Mathew datasets. The hierarchy for Duomo su ff ers from lack of density when leaf collapse is turned on because the dataset itself has smaller density in comparisonwith the others.Table 2: Relationship between the worklist size and performance indicators:front insertion delay, front size, hierarchy construction time and average CPUusage per frame. Numbers refer to the David dataset, no leaf collapse and pointratio 0.25. Worklist Insertion Front Hierarchy CPU8 127ms 529 274.8s 19.5ms16 212ms 439 259.8s 17.8ms32 399ms 401 248.6s 16.0ms64 831ms 500 258.0s 20.8ms128 1646ms 506 255.7s 19.7ms Pipeline 3 corresponds to the traditional approach, in which thehierarchy is read top-down in breadth-first order. This use casesupports incremental visualization of the entire model from thebeginning, starting with a coarse overview and progressivelyshowing more details as the hierarchy is loaded.Figure 13 shows the comparison of input file size whereasFigure 14 compares the time needed to load a hierarchy filewith the time needed to build the hierarchy from sorted pointstreams, as reported in Table 1. Parallel rendering is enabled inall cases.The use case with the best performance depends on the hier-archy file size. For example, reading a hierarchy file with leaf reprint Submitted for review / Computers & Graphics (2020) 11 collapse o ff for David has a significant performance penalty be-cause the same disk is used to read a large file and for swap.The sorted list pipeline for this same case amortizes the swapoverhead.Even though reading the sorted list is generally slower thanreading the hierarchy, there are important benefits to be takeninto consideration, and the choice between one or the otherwould depend on the application scenario. First, the samesorted list can be used for creating the hierarchy with or with-out leaf collapse, and the leaf collapse parameters (compres-sion level) can be chosen on demand. The second advantage isthe reduced size of the input data, which leads to better perfor-mance for large clouds such as the David dataset without leafcollapse, as the swap overhead is reduced. The sorted list oc-cupies roughly 50% of the hierarchy file. While for a fast SSDdisk this is less significant, for other scenarios, such as networkstreaming, it could be very beneficial. Moreover, in the case ofthe hierarchy file, every di ff erent configuration generates a newfile on disk. In other words, storing di ff erent hierarchies (e.g.,with and without leaf collapse) is more wasteful than storing asingle sorted list file. Of course, we could also store di ff erentsorted lists after performing leaf collapse, thus boosting the per-formance and reducing the space in disk, but for these tests weopted for storing the whole sorted list as we believe the extraflexibility is an important contribution. S i z e ( G B ) Input comparison: file size
Fig. 13: Use case input file comparison. LC stands for Leaf Collapse. Savinga complete hierarchy file demands bigger storage than reading a sorted inputcloud file and creating the hierarchy on-the-fly. The performance implicationsare depicted in Figure 14.
We also found it useful to compare OMiCroN with other al-gorithms that create hierarchies for large datasets. To this end,we evaluated the hierarchy creation algorithm used in the largepoint cloud renderer Potree [42]. The methodology was to com-pare the best cases in Figures 9a, 9b and 9c, which include in-put, sorting, hierarchy creation and rendering, and the timingsreported by Potree, which include input and hierarchy creation.All tests created hierarchies with depth 7.Figure 15 shows the results for St. Matthew, Atlas and David.OMiCroN is more than 2 times faster for David and more than4 times faster for St. Matthew and Atlas. An important detail
Input comparison: hierarchy file vs sorted point cloud file timings
Fig. 14: Use case performance comparison. LC stands for Leaf Collapse. Thevalues indicate the time needed to have the complete hierarchy in memory. De-pending on the hierarchy file size it is better to build the hierarchy on-the-flyinstead of loading it from disk because the same disk is used for reading andswap. T i m e ( m i n ) OMiCroN x Potree - Hierarchy creation
Fig. 15: OMiCroN and Potree[42] comparison. OMiCroN is more than 2 timesfaster for David and more than 4 times faster for St. Matthew and Atlas. is that Potree reports creating a hierarchy with only 68% of theinput points for David, whereas St. Matthew and Atlas result in100% of input points usage. OMiCroN is not only significantlyfaster, but its parallel renderer provides support for dataset in-spection during the process, something that Potree cannot do.Please note that the desktop version of Potree was used for allcomparisons.We also performed a comparison with the voxelization algo-rithm for large meshes described in [43]. It should be notedthat the algorithm operates on triangle meshes, and thus the in-put datasets are roughly twice as big as those containing onlythe vertices. However, since a voxelization is an abrupt simpli-fication of the original dataset, the di ff erence in input is com-pensated by the fact that Octree nodes handled by OMiCroN arepopulated with thousands or millions of points while the Octreenodes in the voxelization are boolean values, resulting in ex-tremely compact Octrees with just a few KBytes. For example,the Octree generated by OMiCroN for the David without leafcollapse has more than 22GB. In our tests, [43] was given amemory quota of 16GB and set to a grid size of 128, which isequivalent to a hierarchy of depth 7. In our tests, OMiCroN fin-ishes building the hierarchy 3 to 5 times faster than [43], whichindicates that, even in a traditional setup where preprocessing / Computers & Graphics (2020) precedes rendering, OMiCroN is still very competitive.Finally, we performed a qualitative comparison with the in-cremental BVH construction algorithm proposed in [5]. Thepaper presents four versions of the algorithm: with or withoutglobal updates and with parallel searches or block parallel con-struction. Based on the presented results, we concluded thatthe BVH quality depends on three aspects: the version of thealgorithm used, the stream ordering and the structure of thedata. The same setup that results in a good BVH quality fora given dataset can result in a bad quality BVH for anotherone. Another conclusion is that it tends to perform better thannon-incremental builders for simpler datasets with a lot of planesurfaces (Sibenik, Conference, Soda Hall, Pompeii, SanMigueland PowerPlan) and worse for more complex, biological ones(Armadillo, Hairball and Happy Buddha). In the best case sce-nario (best algorithm choice, stream ordering and structure) thebvh quality can be up to 24% better than top-down builders. Inthe worst case scenario (worst algorithm choice, stream order-ing and structure) the quality can be up to 70% worse.To sum up, one should run the algorithm with several di ff er-ent setups to ensure a good quality BVH. Even in this case, theBVH can be worse than one created using a non-incrementalbuilder because of the structure of the data. An incremen-tal BVH construction algorithm probably could be changed tosupport parallel rendering, but the potential of generating abad quality hierarchy could turn this option prohibitive. Onthe other hand, a non-incremental algorithm such as OMiCroNwould construct a high quality octree regardless, providing ren-dering feedback in the process.
9. Final remarks
In this work, we presented OMiCroN, a flexible and genericalgorithm for rendering large point clouds. We know of noother method that can render incomplete hierarchies with fulldetail in parallel with its construction and data sorting. Rather,the vast majority of algorithms in this category rely on heavypreprocessing, which largely outweighs the time complexity ofthe rendering algorithm proper. OMiCroN, on the other hand,needs only a sorted prefix of the input geometry in Mortoncode order to start rendering. In practice, this sort can adaptto start rendering models as early as the time needed to readinput. OMiCroN’s feedback-based design allows constructionof Octrees on-the-fly and can help implementors with accuraterendering feedback of the construction process. We also definedthe novel idea of Hierarchy Oblique Cut, a strong concept thatcan be used to apply sweeps on hierarchies.Additionally, OMiCroN opens the path for new workflowsbased on streaming of spatially sorted data. Supposing thatlarge scans could be streamed directly in Morton order, the datacould be rendered without any delays at all, enabling earlier de-tection of acquisition problems. Another advantage is that thehierarchical nature of Morton order can be explored, so datasetsare sorted only once using a deep Morton code level but can berendered by OMiCroN using a hierarchy with any level less orequal to the sorting level. This property renders the algorithmeven more flexible, since a single sorted dataset can be usedwith many hierarchy setups.
Even though in this work we have concentrated on describingour Oblique Cuts driven data structure, there are some possibleextensions in order to generalize the method. An important im-provement is to allow for an incremental version of OMiCroN.Briefly, in addition to oblique cuts, we could incorporate intoOMiCroN also horizontal cuts that define depth intervals thatcould be used as loading units. Each horizontal cut would beconstructed by the current version of OMiCroN, fed by an in-dividual stream of points. Those streams could be constructedby sorting the input point cloud once in a deep hierarchy leveland sampling the data with di ff erent granularities, since data or-dered in a Morton curve of level l is also ordered in any curvewith level less than l .Each horizontal cut could be restricted to a level interval bylimiting the f ix operator bottom-up evaluation to a minimumpredefined level. The entire structure would have a single ren-dering front and the horizontal cuts could be linked to leaves atfront evaluation time, using an approach similar to placeholdersubstitution. This extension would also turn OMiCroN out-of-core by definition, since horizontal cuts could be released andconstructed on-the-fly as needed. Regarding future directions, OMiCroN has several possiblepaths to follow. Additionally to the implementation of the ex-tended version, the splat renderer uses parameters set manuallyduring the experiments, since it was not the focus of this work,rather we concentrated our e ff orts on the hierarchy constructionand high-level rendering management. However, it could befurther improved by developing methods to automatically findthe optimal parameters, such as initial u and v vectors, and a bet-ter hierarchical representation of the splats [44]. Moreover, intheory, OMiCroN’s deepest abstraction layer could be modifiedto use the algorithm in other Computer Graphics problems in-volving the use of Morton-ordered hierarchical structures, suchas raytracing, voxelization and reconstruction. OMiCroN’s source code public repository can be found athttps: // github.com / dsilvavinicius / OMiCroN.
This research was supported by the National Council for Sci-entific and Technological Development (CNPq).
References [1] Rusinkiewicz, S, Levoy, M. Qsplat: A multiresolution point render-ing system for large meshes. In: Proceedings of the 27th Annual Con-ference on Computer Graphics and Interactive Techniques. SIGGRAPH’00; New York, NY, USA: ACM Press / Addison-Wesley Publishing Co.ISBN 1-58113-208-5; 2000, p. 343–352. URL: http://dx.doi.org/10.1145/344779.344940 . doi: .[2] Wimmer, M, Scheiblauer, C. Instant points: Fast rendering of unpro-cessed point clouds. In: Proceedings of the 3rd Eurographics / IEEEVGTC Conference on Point-Based Graphics. SPBG’06; Aire-la-Ville,Switzerland, Switzerland: Eurographics Association. ISBN 3-905673-32-0; 2006, p. 129–137. URL: http://dx.doi.org/10.2312/SPBG/SPBG06/129-136 . doi: .reprint Submitted for review / Computers & Graphics (2020) 13[3] Klein, J, Zachmann, G. Point cloud collision detection. In: ComputerGraphics Forum; vol. 23. Wiley Online Library; 2004, p. 567–576.[4] Ericson, C. Real-Time Collision Detection. Boca Raton, FL, USA: CRCPress, Inc.; 2004. ISBN 1558607323, 9781558607323.[5] Bittner, J, Hapala, M, Havran, V. Incremental bvh construction for raytracing. Computers & Graphics 2015;47:135–144.[6] Morton, . A computer oriented geodetic data base and a new technique infile sequencing. Tech. Rep. Ottawa, Ontario, Canada; IBM Ltd.; 1966.[7] Klosowski, JT, Held, M, Mitchell, JSB, Sowizral, H, Zikan, K. E ffi cientcollision detection using bounding volume hierarchies of k-dops. IEEETransactions on Visualization and Computer Graphics 1998;4(1):21–36. URL: http://dx.doi.org/10.1109/2945.675649 . doi: .[8] Ehmann, SA, Lin, MC. Accurate and fast proximity queries betweenpolyhedra using convex surface decomposition. Computer Graphics Fo-rum 2001;20(3):500–511.[9] Lauterbach, C, Mo, Q, Manocha, D. gproximity: Hierarchical gpu-based operations for collision and distance queries. Comput Graph Forum2010;29(2):419–428.[10] Argudo, O, Besora, I, Brunet, P, Creus, C, Hermosilla,P, Navazo, I, et al. Interactive inspection of complex multi-object industrial assemblies. Computer-Aided Design 2016;79:48 –59. URL: . doi: https://doi.org/10.1016/j.cad.2016.06.005 .[11] Levoy, M, Whitted, T. The use of points as a display primitive. ChapelHill, NC, USA: University of North Carolina, Department of ComputerScience; 1985.[12] Grossman, JP, Dally, WJ. Point sample rendering. In: Drettakis, G,Max, N, editors. Rendering Techniques ’98. Vienna: Springer Vienna.ISBN 978-3-7091-6453-2; 1998, p. 181–192.[13] Sainz, M, Pajarola, R. Point-based rendering techniques. Computers &Graphics 2004;28(6):869–879.[14] Kobbelt, L, Botsch, M. A survey of point-based techniques in computergraphics. Computers & Graphics 2004;28(6):801–814.[15] Alexa, M, Gross, M, Pauly, M, Pfister, H, Stamminger, M, Zwicker,M. Point-based computer graphics. In: ACM SIGGRAPH 2004 CourseNotes. ACM; 2004, p. 7.[16] Gross, M. Getting to the point...? IEEE computer graphics and applica-tions 2006;26(5):96–99.[17] Gross, M, Pfister, H. Point-Based Graphics. San Francisco, CA,USA: Morgan Kaufmann Publishers Inc.; 2011. ISBN 0123706041,9780080548821.[18] Ramos, F, Huerta, J, Benitez, F. Characterization of multiresolutionmodels for real-time rendering in gpu-limited environments. In: In-ternational Conference on Articulated Motion and Deformable Objects.Springer; 2016, p. 157–167.[19] Dachsbacher, C, Vogelgsang, C, Stamminger, M. Sequential pointtrees. In: ACM Transactions on Graphics (TOG); vol. 22. ACM; 2003, p.657–662.[20] Pajarola, R, Sainz, M, Lario, R. Xsplat: External memory multiresolu-tion point visualization. In: Proceedings IASTED Invernational Confer-ence on Visualization, Imaging and Image Processing. 2005, p. 628–633.[21] Gobbetti, E, Marton, F. Layered point clouds: A simple and ef-ficient multiresolution structure for distributing and rendering gigan-tic point-sampled models. Comput Graph 2004;28(6):815–826. URL: http://dx.doi.org/10.1016/j.cag.2004.08.010 . doi: .[22] Wand, M, Berner, A, Bokeloh, M, Fleck, A, Ho ff mann, M, Jenke, P,et al. Interactive editing of large point clouds. In: SPBG. 2007, p. 37–45.[23] Bettio, F, Gobbetti, E, Marton, F, Tinti, A, Merella, E, Combet, R. Apoint-based system for local and remote exploration of dense 3d scannedmodels. In: VAST. 2009, p. 25–32.[24] Hubo, E, Bekaert, P. A data distribution strategy for parallel point-basedrendering. 2005,.[25] Corrˆea, WT, Fleishman, S, Silva, CT. Towards point-based acquisitionand rendering of large real-world environments. In: Computer Graphicsand Image Processing, 2002. Proceedings. XV Brazilian Symposium on.IEEE; 2002, p. 59–66.[26] Corrˆea, WT, Klosowski, JT, Silva, CT. Out-of-core sort-first par-allel rendering for cluster-based tiled displays. Parallel Computing2003;29(3):325–338. [27] Goswami, P, Makhinya, M, B¨osch, J, Pajarola, R. Scalable parallelout-of-core terrain rendering. In: EGPGV. 2010, p. 63–71.[28] Goswami, P, Erol, F, Mukhi, R, Pajarola, R, Gobbetti, E. Ane ffi cient multi-resolution framework for high quality interactive render-ing of massive point clouds using multi-way kd-trees. The VisualComputer 2013;29(1):69–83. URL: http://dx.doi.org/10.1007/s00371-012-0675-2 . doi: .[29] Lukac, N, et al. Hybrid visualization of sparse point-based data usinggpgpu. In: Computing and Networking (CANDAR), 2014 Second Inter-national Symposium on. IEEE; 2014, p. 178–184.[30] Gao, Z, Nocera, L, Wang, M, Neumann, U. Visualizing aerial li-dar cities with hierarchical hybrid point-polygon structures. In: Proceed-ings of Graphics Interface 2014. GI ’14; Toronto, Ont., Canada, Canada:Canadian Information Processing Society. ISBN 978-1-4822-6003-8;2014, p. 137–144. URL: http://dl-acm-org.ez29.capes.proxy.ufrj.br/citation.cfm?id=2619648.2619672 .[31] Richter, R, Discher, S, D¨ollner, J. Out-of-core visualization of classified3d point clouds. In: 3D Geoinformation Science. Springer; 2015, p. 227–242.[32] Febretti, A, Richmond, K, Doran, P, Johnson, A. Parallel processing andimmersive visualization of sonar point clouds. In: Large Data Analysisand Visualization (LDAV), 2014 IEEE 4th Symposium on. IEEE; 2014,p. 111–112.[33] Potenziani, M, Callieri, M, Dellepiane, M, Corsini, M, Ponchio, F,Scopigno, R. 3dhop: 3d heritage online presenter. Computers & Graphics2015;52:129–141.[34] Tredinnick, R, Broecker, M, Ponto, K. Experiencing interior envi-ronments: New approaches for the immersive display of large-scale pointcloud data. In: Virtual Reality (VR), 2015 IEEE. IEEE; 2015, p. 297–298.[35] Okamoto, H, Masuda, H. A point-based virtual reality system for sup-porting product development. In: ASME 2016 International Design En-gineering Technical Conferences and Computers and Information in En-gineering Conference. American Society of Mechanical Engineers; 2016,p. V01BT02A052–V01BT02A052.[36] Ponto, K, Tredinnick, R, Casper, G. Simulating the experience of homeenvironments. In: Virtual Rehabilitation (ICVR), 2017 International Con-ference on. IEEE; 2017, p. 1–9.[37] Karras, T. Maximizing parallelism in the construction of bvhs, oc-trees, and k-d trees. In: Proceedings of the Fourth ACM SIG-GRAPH / Eurographics conference on High-Performance Graphics. Euro-graphics Association; 2012, p. 33–37.[38] Zwicker, M, Pfister, H, van Baar, J, Gross, M. Surface splatting. In:Proceedings of the 28th Annual Conference on Computer Graphics andInteractive Techniques. SIGGRAPH ’01; New York, NY, USA: ACM.ISBN 1-58113-374-X; 2001, p. 371–378. URL: http://doi.acm.org/10.1145/383259.383300 . doi: .[39] Zwicker, M, R¨as¨anen, J, Botsch, M, Dachsbacher, C, Pauly, M.Perspective accurate splatting. In: Proceedings of Graphics Interface2004. GI ’04; School of Computer Science, University of Waterloo,Waterloo, Ontario, Canada: Canadian Human-Computer Communica-tions Society. ISBN 1-56881-227-2; 2004, p. 247–254. URL: http://dl.acm.org/citation.cfm?id=1006058.1006088 .[40] Botsch, M, Spernat, M, Kobbelt, L. Phong splatting. In: Proceedingsof the First Eurographics Conference on Point-Based Graphics. SPBG’04;Aire-la-Ville, Switzerland, Switzerland: Eurographics Association. ISBN3-905673-09-6; 2004, p. 25–32. URL: http://dx.doi.org/10.2312/SPBG/SPBG04/025-032 . doi: .[41] Weyrich, T, Heinzle, S, Aila, T, Fasnacht, DB, Oetiker, S,Botsch, M, et al. A hardware architecture for surface splatting.In: ACM SIGGRAPH 2007 Papers. SIGGRAPH ’07; New York, NY,USA: ACM; 2007,URL: http://doi.acm.org/10.1145/1275808.1276490 . doi: .[42] Sch¨utz, M. Potree: Rendering large point clouds in web browsers. Tech-nische Universit¨at Wien, Wiede´n 2016;.[43] Baert, J, Lagae, A, Dutr, P. Out-of-core construction of sparse voxeloctrees. Computer Graphics Forum 2014;33(6):220–227. URL: http://dx.doi.org/10.1111/cgf.12345 . doi: .[44] Wu, J, Zhang, Z, Kobbelt, L. Progressive splatting. In: Proceedings Eu-rographics / IEEE VGTC Symposium Point-Based Graphics, 2005. 2005,p. 25–142. doi:10.1109/PBG.2005.194060