[PDF] Comparing Hierarchical Data Structures for Sparse Volume Rendering with Empty Space Skipping

Abstract

Empty space skipping can be efficiently implemented with hierarchical data structures such as k-d trees and bounding volume hierarchies. This paper compares several recently published hierarchical data structures with regard to construction and rendering performance. The papers that form our prior work have primarily focused on interactively building the data structures and only showed that rendering performance is superior to using simple acceleration data structures such as uniform grids with macro cells. In the area of surface ray tracing, there exists a trade-off between construction and rendering performance of hierarchical data structures. In this paper we present performance comparisons for several empty space skipping data structures in order to determine if such a trade-off also exists for volume rendering with uniform data topologies.

Full PDF

TTechnical Report (2019)

Comparing Hierarchical Data Structures for Sparse VolumeRendering with Empty Space Skipping

Stefan Zellmann † Abstract

Empty space skipping can be efﬁciently implemented with hierarchical data structures such as k-d trees and bounding volumehierarchies. This paper compares several recently published hierarchical data structures with regard to construction and ren-dering performance. The papers that form our prior work have primarily focused on interactively building the data structuresand only showed that rendering performance is superior to using simple acceleration data structures such as uniform grids withmacro cells. In the area of surface ray tracing, there exists a trade-off between construction and rendering performance of hi-erarchical data structures. In this paper we present performance comparisons for several empty space skipping data structuresin order to determine if such a trade-off also exists for volume rendering with uniform data topologies.

1. Introduction

Ever growing data set sizes resulting from scientiﬁc simulationsalso result in an increased demand for more efﬁcient rendering al-gorithms for volumetric data sets. While strategies like adaptivemesh reﬁnement (AMR) [WWW ∗

19] help to reduce the memoryspace required for the data domain, simulations to a large degreestill use uniform grid topologies, so that volume rendering withstructured grids is still an active area of research. Structured gridsalso result from medical imaging techniques such as computer to-mography (CT) or magnetic resonance imaging (MRI). Data setsfrom the clinical context in practice often have low resolution (forsuch reasons as to avoid unnecessarily exposing patients to radia-tion), but on the other hand often consist of multiple modalities likeﬁber tracks or blood vessels that were derived or segmented fromthe original CT or MRI data and manifest as separate volume chan-nels. Those derived volume channels are often sparse in nature.In all the described settings, it is beneﬁcial if the user can explorethe 3-d data set by means of interaction. A part of this interaction isadjusting the color and alpha transfer function that maps scalar in-put from the volume ﬁeld to (RGB) color and opacity. When the al-pha transfer function changes, so does the overall amount of emptyspace as well as the spatial arrangement of non-empty voxels.A number of recent publications from our research group[ZSL18,ZHL19,ZSL19,ZML19] has therefore concentrated on theconstruction performance of spatial indices for direct volume ren-dering and proposed several data structures that can - depending onthe size of the data set - be built at interactive rates. As the construc- † [email protected], Department of Computer Science, University ofCologne tion times reported in the papers vary signiﬁcantly, in this paper wepresent a thorough performance comparison of the techniques.The construction algorithm from [ZHL19] is based on the lin-ear bounding volume hierarchy (LBVH) algorithm [LGS ∗

09] andshows signiﬁcantly improved construction rates compared to theother more involved data structures. In the ﬁeld of real-time raytracing with surface geometry, the LBVH data structure is generallyknown for its inferior culling properties but superior constructionperformance. The data structures from [ZSL18] and [ZSL19] es-pecially employ very intricate construction schemes that use a costfunction comparable to the surface area heuristic (SAH) [Wal07]known from surface ray tracing. In that ﬁeld, it is generally ac-cepted that the quality obtained with SAH is usually higher thanthat of LBVH, which uses the middle split heuristic. The con-struction time for hierarchies built with SAH however is generallyhigher than the construction time required when using a simplerheuristic.While the various papers from our prior work individuallyproved the effectiveness of the respective data structures in general,they only compared them with naïve ray marching without emptyspace skipping, or with simple data structures like structured gridswith macro cells. This paper tries to ﬁll the gap in this respect andprovides a comparison of the various data structures against eachother. This is done by testing the data structures using differentcombinations of data sets and transfer functions.This paper shall be understood as an addendum to our priorwork: the benchmark results that constitute the main contribution of this paper are meant to serve as a guideline and to give furtherinsight into when to use which of the data structures we proposedin our recent papers on empty space skipping. a r X i v : . [ c s . G R ] D ec Zellmann / Empty Space Skipping Comparison

2. Related Work

Interactivity is an important property for scientiﬁc visualizationsand can be aided in several ways, such as via remote rendering[ZAL12, SH15], level-of-detail techniques [LCCK02] or adaptivesampling of the data domain [MUWP19]. Empty space skippingis one of the traditional optimization strategies for direct volumerendering (DVR) algorithms and is often implemented using hi-erarchical data structures [LMK03, HBJP12, LCDP13]. A generaloverview on empty space skipping and other optimization tech-niques can e.g. be found in the state-of-the-art report by Beyeret al. [BHP14]. Notable systems or system approaches are thoseby Hadwiger et al. [HBJP12, HAAB ∗

18] or the grid-based solu-tion that can be found in OSPRay [WJA ∗ ∗

09, PL10, MB18], interest from scientiﬁc visualiza-tion researchers has only recently started to focus on this topic.Our group has recently proposed a number of empty space skip-ping data structures [ZSL18, ZHL19, ZSL19, ZML19] that are pri-marily aimed at fast spatial index reconstruction and are explainedin detail in the following section. While our approach is focused onrebuilding the hierarchy whenever the data or the transfer functionchanges, another approach is to reuse or adapt an already existingacceleration data structure. This can e.g. be achieved by means of min-max range queries [WFKH07, KTW ∗

11, Wal19]. An alter-native approach to update an existing data structure is the one bySchneider et al. [SR17] who use Fenwick trees, which bare similar-ities to the summed area tables that our techniques use as auxiliarydata structures.

3. Construction Algorithms for Empty Space SkippingHierarchies

In this section we brieﬂy summarize the various construction algo-rithms from our recent papers. While the linear bounding volumehierarchy algorithm is solely targeted towards GPUs, the k -d con-struction algorithm we have targeted towards both multi-core andGPU systems. The hybrid grid construction algorithm is based on the multi-core CPU variant of the k -d tree construction algorithmbut could generally also be implemented on the basis of the GPUvariant. The LBVH algorithm was initially introduces by Lauterbach etal. [LGS ∗ and then run a CUDA kernel where each thread is responsiblefor one voxel. Each thread determines if the voxel is visible w.r.t.the current transfer function. The threads responsible for one brickthen vote if the whole brick is visible by atomically updating a ﬂagin on-device shared memory. This per-voxel operation is followedby a number of operations with O ( n ) work complexity—where n is the number of bricks—each of which is carried out in a singleCUDA kernel. We ﬁrst perform compaction since we are only in-terested in the non-empty bricks. Then we assign 30-bit 3-d Mortoncodes to the non-empty bricks that we use to sort the bricks using aparallel O ( n ) GPU algorithm from the Thrust library [BH11]. TheMorton order implicitly deﬁnes a hierarchy over the bricks that justspatially splits the respective child nodes in the middle. The splitpositions can be read from the bit codes of the Morton indices andcan be efﬁciently found using Karras’ algorithm [Kar12]. A ﬁnalCUDA kernel traverses the hierarchy from each leaf node up to theroot node and assembles the respective axis-aligned bounding boxof each inner node encountered along that path. k -d trees The k -d tree construction algorithms from [ZSL18] and [ZSL19]are loosely based on original work by Vidal et al. [VMD08] whichemploys a summed area table to quickly determine the occupancy(i.e. the number of non-empty voxels) inside a volumetric regionbound by an axis-aligned bounding box. We ﬁrst introduce themulti-core parallel variant of the construction algorithm that willproduce the exact same results as the algorithm by Vidal et al. (apartfrom certain parameters such as halting criteria etc. that we mightset differently) and then brieﬂy describe the adaptation of this al-gorithm to the GPU. With the x64 multi-core CPU implementation that was ﬁrst pre-sented in [ZSL18] and later reﬁned in [ZSL19], the volume isﬁrst decomposed into bricks of size 32 . A preclassiﬁed versionwhere each voxel is only associated with a ﬂag telling whether itis empty or not is derived from that whenever the transfer functionchanges. We then build a three-dimensional summed area table (a“summed volume table”, SVT) over the binary preclassiﬁcation foreach brick. By choosing a brick size of 32 , an SVT will ﬁt exactlyinto the L1 cache of an x64 CPU. The respective SVTs for eachbrick are built in parallel.This SVT construction phase is followed by a second algorith-mic phase where a k -d tree is built in a top-down fashion. We ﬁrstdetermine a tight bounding box for the root node by shrinking theaxis-aligned bounding box of the whole volume to tightly ﬁt the ellmann / Empty Space Skipping Comparison non-empty voxels according to the current transfer function. Thiscan be done by querying the SVTs to determine the occupancy in-side the boxes. Occupancy queries are performed in parallel foreach SVT that the bounding box overlaps. Therefore, local bound-ing boxes are computed in parallel per SVT and the result is latercombined to form the overall bounding box. The shrinking proce-dure just compares the occupancy inside smaller boxes to the oc-cupancy of the parent bounding box; if the occupancy is the same,shrinking was valid. A cost function is used to determine an opti-mal splitting plane by using sweeping and by inspecting the volumeof the (tight) bounding box to the left and the right of the candi-date planes. Certain halting criteria favor either shallow or deeptrees—both of which we evaluate in Section 4—and can be set asparameters to the construction algorithm. The targeted size (height;number of nodes) of the hierarchy one can expect to inﬂuence bothconstruction time and rendering performance. The CPU construction algorithm—would it be ported withoutmodiﬁcations—is not well suited for execution on GPUs, at leastnot if the whole algorithm would be carried out in a single GPUkernel. In [ZSL19] we therefore proposed an adapted version ofthe algorithm that performs the plane sweeping and top-down con-struction phase of the algorithm on the CPU, while the procedurethat ﬁnds tight bounding boxes around non-empty voxels is off-loaded to the GPU.The plane sweeping procedure thus involves starting two GPUkernels for both the left and the right half-space and for each planetested against. To reduce the number of kernel calls, we use a bin-ning approach. Binning has another crucial advantage: strategicallyaligning the bin boundaries on the same raster imposed by theSVTs, we will never consider plane candidates that would splitan SVT into two halves and can thus just precompute the tightbounding box inside that macro cell once when the transfer func-tion changes (as opposed to each time that we consider anotherplane candidate). We do this by initially computing SVTs that arehowever immediately discarded as soon as the local bounding boxhas been determined. In contrast to the CPU, where we optimizedfor L1 memory, on the GPU we explicitly perform the computa-tions in shared memory, so that a macro cell will have a size of8 .We order the resulting bounding boxes on a z-order Mortoncurve. In order to determine a tight bounding box for a spatial re-gion spanning multiple macro cells during splitting, we perform aparallel reduction on the GPU using Thrust. Since this is a 1-d oper-ation, we need to check if the macro cells we consider are actuallyinside the volumetric region of interest. To minimize the numberof macro cells to test, we ﬁrst conservatively determine the ﬁrstand last cell in the list that will deﬁnitely fall inside the region ofinterest by using the Morton code order of the list. Although the al-gorithm is per se not very well suited for GPUs due to the top-downrecursion, we still achieve good GPU utilization during splitting aswe perform the reduction twice for each candidate plane, and alsobecause of the smaller size and thus larger amounts of the macrocells to reduce over. Our benchmarks suggested that shallow k -d trees built with theoriginal halting criteria proposed by Vidal et al. would result ineffective space skipping data structures, but on contemporary hard-ware, deeper trees with a leaf node size of approximately 8 wouldperform even better. The approach by Vidal et al. was originally in-tended to be used in a way where macro blocks would be renderedusing an outer loop, which calls a volume rendering shader perblock. In [ZML19] we evaluated the outer loop approach againstfull ray traversal on the GPU; we found that full ray traversal withrelatively deep trees would outperform the outer loop approach dur-ing rendering, but also that the construction times for deep trees wassigniﬁcantly higher than that for shallow trees.We therefore proposed an alternative rendering strategy, whichwould combine a shallow k -d tree with only relatively few leaveswith a global grid of macro cells. As the k -d tree construction al-gorithm effectively uses macro cells—each SVT can be thought ofas a macro cell that stores occupancy information about its volu-metric region—deriving a grid from that and sending it over to theGPU is straightforward and comes at no recognizable storage over-head. Instead of the original size of 32 that the construction algo-rithm uses, we found macro cells of size 16 in general to performbetter and thus resample the grid to that size, which can still bedone in constant time using the SVTs. We adapted the ray march-ing algorithm to perform full traversal per ray until we ﬁnd a k -dtree leaf node, and inside that one use the global grid to skip overempty space with ﬁner granularity. Our benchmarks suggest thatthe data structure is helpful in certain cases, speciﬁcally when thedata sets are large (i.e. near the amount of available texture mem-ory), or when empty space manifests as wholes inside the volume.Construction performance however was literally the same as thatfor shallow k -d trees.

4. Performance Comparisons

The main focus of this paper is a thorough comparison of the var-ious data structures w.r.t. construction and rendering performance.To achieve this, we integrated them into the Virvo volume render-ing library [SWWL01] and use the Visionaray library [ZWL17] toimplement the low level ray tracing algorithms like k -d tree or BVHtraversal. Our test system consists of an eight-core Intel Xeon Gold 5122CPU system with 3.60 GHz clock frequency and an NVIDIA Titan-V graphics card with 12 GB GDDR video memory. The systemis equipped with 96 GB DDR memory. For the k -d tree and hy-brid grid construction algorithms, we deactivate simultaneous mul-tithreading (marketed by Intel under the name “hyperthreading”)and assign a single core to each thread using the numactl toolthat comes with the Linux distribution installed on our test system.For the evaluation we use the data sets and transfer function set-tings from Figure 1. Note that Figure 1 does not depict the exactview points we used for the evaluation. Rather than that, we usean orthographic camera and a view point that is zoomed in so that Zellmann / Empty Space Skipping Comparison

Aneurism Xmas Tree Magnetic Reconnection Simulation Stag Beetle256 Voxels 512 × ×

512 Voxels 512 Voxels 832 × ×

494 VoxelsOccupancy: 1 .

01 % Occupancy: 2 .

90 % Occupancy: 16 . .

04 %Kingsnake Menger Sponge Richtmyer-Meshkov Instability N-Body1024 × ×

795 Voxels 1024 Voxels 2048 × × / 512 / 1024 / 2048 VoxelsOccupancy: 0 .

39 % Occupancy: 40 . . .

15 %

Figure 1: Data sets with spatial dimensions and occupancy (percentage of voxels that are visible given a certain transfer function) used toevaluate our method. We pick a number of different settings with transfer functions that favor different types of spatial arrangements of thevisible voxels.

Data Set LBVH CPU Shallow CPU, MLS=32 CPU, MLS=128 GPU Shallow GPU, MLS=32 GPU, MLS=128 Table 1: Statistics for the various data sets we use for the evaluation. We report the number of nodes and the height of the respective treestructures: Linear bounding volume hierarchies with leaves of size 8 . Shallow k -d trees where a leaf is created when the volume of the nodegets below 10 % of the root node’s volume during splitting; k-d trees with leaf volume less or equal 8 and a maximum leaf node size of 32 ; k-d trees with leaf volume less or equal 8 and a maximum leaf node size of 128 ; the k -d trees are built with different algorithms dependingon whether they are built on the CPU or the GPU; on the GPU we use four bins to determine candidate split planes (cf. Section 3.2.2)the volume ﬁlls the whole rendering window. We use a CUDA-based renderer with the absorption plus emission model and post-classiﬁcation transfer function lookups; for our tests we deactivategradient shading and early-ray termination. For the benchmarks weuse a rotating camera animation to account for and average out ef- fects like unfavorable offsets and strides when accessing 3-d tex-tures from certain viewing angles. We render images into view-ports of size 1024 × ellmann / Empty Space Skipping Comparison Figure 2: Enforcing a maximum leaf size to avoid unfavorable local optima. Top row, left: Xmas tree data set with bounding box overlay.Building a deep k -d tree for this data set and transfer function combination will cause the splitting heuristic to accept a local optimumresulting in a single very large leaf node (middle). Enforcing a maximum leaf node size can mitigate this issue (right). If the volume consistsof regions that are non-empty (bottom row, left), enforcing a maximum leaf size can however have the undesirable effect of the hierarchygrowing exceptionally deep and wide and splitting regions containing no empty space (middle). A sensible choice for the maximum leafsize depends on the data set and the transfer function (right). We consider investigating this issue and potentially ﬁnding better solutionsinteresting future work. We use a variety of different data set and transfer function combi-nations that are depicted in Figure 1. We deliberately choose com-binations with a varying number of non-empty voxels. Since allalgorithms are in-core, the largest data sets we test with have a sizeof 2048 voxels, which amounts to 8 GB when voxels are storedwith 8 bit precision. The Richtmeyr-Meshkov instability speciﬁ-cally has two prominent large regions that are either empty, or notempty but totally homogeneous. We expect the different data struc-tures to adapt differently to the large amount of empty space. TheMenger Sponge data set we consider a hard case for most emptyspace skipping data structures as the empty space is contained in-side the volume, a spatial arrangement which is known to be par-ticularly ill suited for typical k -d tree builders. We consider three different types of k -d trees: We build shallowtrees where we only split a node if its volume is above 10 % ofthe root node’s volume (that is the original halting criterion pro-posed by Vidal et al. [VMD08]). This setup will create large volumechunks which will potentially contain lots of empty space. The tworemaining setups employ a halting criterion where a node is splitwhenever its volume is above 8 and will result in deeper trees.The greedy heuristic will in certain cases accept a local optimumand thus stop the recursion due to the cost function (cf. Figure 2),even though the resulting leaf’s extent is quite large and still boundsa substantial amount of empty space. That in particular happenswhen empty space is contained inside the volume and was alreadypointed out by Vidal et al. [VMD08] This generally undesirablebehavior can be mitigated by enforcing a split when the leaf nodewould otherwise exceed a certain size. We perform benchmarks forconﬁgurations where we enforce a maximum leaf node size of ei-ther 32 or 128 ; when the cost function reports a leaf that exceeds Zellmann / Empty Space Skipping Comparison

Aneurism Xmas Tree Magnetic Stag Beetle Kingsnake Menger Sponge050100150200250 F r a m e s p e r S e c o n d NaiveLBVHCPU Deep (Max Leaf Size = 32) CPU Deep (Max Leaf Size = 128)CPU Deep (Max Leaf Size = 128)GPU Deep (Max Leaf Size = 32) GPU Deep (Max Leaf Size = 128)GPU Deep (Max Leaf Size = 128)HybridRichtmeyr-Meshkov N-Body 256 N-Body 512 N-Body 1024 N-Body 2048 F r a m e s p e r S e c o n d Figure 3: Rendering performance obtained with our benchmarks. We render rotating orthographic views with a 1024 × Our benchmarks suggest that construction with the LBVH algo-rithm will be very fast, but as this algorithm makes the least in-formed decision as of the position where to perform the split willgenerally result in inferior spatial indices. Out of the several datastructures that use SVTs to ﬁnd tight bounding boxes, we see a clearcorrelation between the depth of the resulting tree and the construc-tion performance, but also observe that deeper trees will generallybe superior to more shallow ones.The problem with the algorithms accepting local optima can onlypartially be mitigated using a maximum leaf size. In particular, thisstrategy might fail when the data set is huge and extra splits of ho-mogeneous space that would otherwise have been bound by a sin-gle leaf cause the resulting spatial index to be deeper and containfar more leaf nodes than without using this strategy. This behav-ior can be seen in Figure 2 for the Richtmeyr-Meshkov data set,where the homogeneous space to the bottom of the data set is splitexcessively and to no avail. In the future, we intend to investigatealternative strategies to circumvent this problem. ellmann / Empty Space Skipping Comparison Aneurism Xmas Tree Magnetic Stag Beetle Kingsnake Menger Sponge10 −3 −2 −1 l − g ( e c . ) Na(3e (n/a)LBVHCPU Deep (Max Leaf Size = 32) CPU Deep (Max Leaf Size = 128)CPU Deep (Max Leaf Size = 128)GPU Deep (Max Leaf Size = 32) GPU Deep (Max Leaf Size = 128)GPU Deep (Max Leaf Size = 128)HybridRichtmeyr-Meshkov N-Body 256 N-Body 512 N-Body 1024 N-Body 2048 −2 −1 l − g ( e c . ) Figure 4: Build rates for the various data sets and hierarchy construction algorithms. Note the logarithmic scale of the y-axis. For the absolutetiming results cf. Table 2.Our benchmarks also give an overview of how effective the al-gorithms are given a certain amount of empty space and speciﬁcspatial arrangements. We already noted that empty space insidenodes is particularly hard to ﬁnd for the construction algorithms.The Menger Sponge data set e.g. suffers from this problem and onecan see that hardly any of the algorithms is effective at skippingempty space in this situation.While the hybrid grid algorithm will usually not outperform thedeep k -d tree construction algorithms, its construction rate is ingeneral superior to the latter. Hybrid grids—for larger data sets—enable frame rates that fall in-between those of shallow and deep k -d trees. For that reason, we consider them an interesting data struc-ture. As the shallow k -d tree construction algorithm is in most casesalmost as fast as the LBVH algorithm, it might be interesting tocombine this construction algorithm with the hybrid grid algorithminstead of using k -d tree construction on the CPU.

5. Conclusion

In this paper, we thoroughly compared the various algorithms toconstruct spatial indices for empty space skipping and structuredvolumes we proposed in our prior work. We proved that the con-struction algorithms that make a more informed decision than justsplitting in the middle along one access result in superior spaceskipping hierarchies. The paper may serve as a guideline for praci-tioners to decide which algorithm to choose depending on the spe-ciﬁc problem. A general limitation of the top-down constructionalgorithms is that the greedy heuristic may accept local optima,which can be partially mitigated by enforcing a maximum leaf nodesize. We however showed that this strategy might fail in certain sit-uations and consider an alternative solution to that problem inter-esting future work.

Acknowledgements

We thank Ingo Wald for helpful discussion and in particular forpointing out the trick to enforce a maximum leaf node size by in-corporating a middle split.

Zellmann / Empty Space Skipping Comparison

References [BH11] B

ELL

N., H

OBEROCK

J.: Thrust: A productivity-oriented li-brary for CUDA. In

GPU Computing Gems Jade Edition , Hwu W.-m. W., (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 2011, ch. 26, pp. 359–373. 2[BHP14] B

EYER

J., H

ADWIGER

M., P

FISTER

H.: A Survey of GPU-Based Large-Scale Volume Visualization. In

EuroVis - STARs (2014),Borgo R., Maciejewski R., Viola I., (Eds.), The Eurographics Associa-tion. 2[GHFB13] G U Y., H E Y., F

ATAHALIAN

K., B

LELLOCH

G.: Efﬁcientbvh construction via approximate agglomerative clustering. In

Proceed-ings of the 5th High-Performance Graphics Conference (New York, NY,USA, 2013), HPG ’13, ACM, pp. 81–88. 2[GM19] G

ANTER

D., M

ANZKE

M.: An Analysis of Region ClusteredBVH Volume Rendering on GPU.

Computer Graphics Forum (2019). 2[HAAB ∗

18] H

ADWIGER

M., A L -A WAMI

A. K., B

EYER

J., A

GOS

M.,P

FISTER

H.: SparseLeap: Efﬁcient empty space skipping for large-scalevolume rendering.

IEEE Transactions on Visualization and ComputerGraphics (2018). 2[HBJP12] H

ADWIGER

M., B

EYER

J., J

EONG

W.-K., P

FISTER

H.: Inter-active volume exploration of petascale microscopy data streams using avisualization-driven virtual memory approach.

IEEE Trans. Vis. Comput.Graph. 18 , 12 (2012), 2285–2294. 2[Kar12] K

ARRAS

T.: Maximizing parallelism in the construction ofBVHs, octrees, and k-d trees. In

Proceedings of the Fourth ACMSIGGRAPH / Eurographics Conference on High-Performance Graphics (Goslar Germany, Germany, 2012), EGGH-HPG’12, Eurographics Asso-ciation, pp. 33–37. 2[KTW ∗

11] K

NOLL

A., T

HELEN

S., W

ALD

I., H

ANSEN

C., H

AGEN

H.,P

APKA

M.: Full-resolution interactive CPU volume rendering with co-herent BVH traversal. In

Proceedings of IEEE Paciﬁc Visualization 2011 (2011), pp. 3–10. 2[LCCK02] L

EVEN

J., C

ORSO

J., C

OHEN

J., K

UMAR

S.: Interactivevisualization of unstructured grids using hierarchical 3d textures. In

Proceedings of the 2002 IEEE Symposium on Volume Visualization andGraphics (Piscataway, NJ, USA, 2002), VVS ’02, IEEE Press, pp. 37–44.2[LCDP13] L IU B., C

LAPWORTHY

G. J., D

ONG

F., P

RAKASH

E. C.:Octree rasterization: Accelerating high-quality out-of-core GPU volumerendering.

IEEE Transactions on Visualization and Computer Graphics19 , 10 (Oct 2013), 1732–1745. 2[LGS ∗

09] L

AUTERBACH

C., G

ARLAND

M., S

ENGUPTA

S., L

UEBKE

D., M

ANOCHA

D.: Fast BVH construction on GPUs.

Computer Graph-ics Forum (2009). 1, 2[LMK03] L I W., M

UELLER

K., K

AUFMAN

A.: Empty space skippingand occlusion clipping for texture-based volume rendering. In

IEEE Vi-sualization, 2003. VIS 2003. (Oct 2003), pp. 317–324. 2[MB18] M

EISTER

D., B

ITTNER

J.: Parallel locally-ordered clusteringfor bounding volume hierarchy construction.

IEEE Transactions on Vi-sualization and Computer Graphics 24 , 03 (mar 2018), 1345–1353. 2[MUWP19] M

ORRICAL

N., U

SHER

W., W

ALD

I., P

ASCUCCI

V.: Efﬁ-cient Space Skipping and Adaptive Sampling of Unstructured VolumesUsing Hardware Accelerated Ray Tracing. In

IEEE VIS 2019 - ShortPapers (2019). 2[PL10] P

ANTALEONI

J., L

UEBKE

D.: HLBVH: Hierarchical LBVH con-struction for real-time ray tracing of dynamic geometry. In

Proceedingsof the Conference on High Performance Graphics (Aire-la-Ville, Switzer-land, Switzerland, 2010), HPG ’10, Eurographics Association, pp. 87–95.2[SH15] S HI S., H SU C.-H.: A survey of interactive remote renderingsystems.

ACM Comput. Surv. 47 , 4 (May 2015), 57:1–57:29. 2[SR17] S

CHNEIDER

J., R

AUTEK

P.: A versatile and efﬁcient GPU datastructure for spatial indexing.

IEEE Transactions on Visualization andComputer Graphics 23 , 1 (Jan 2017), 911–920. 2 [SWWL01] S

CHULZE

J., W

OESSNER

U., W

ALZ

S., L

ANG

U.: Volumerendering in a virtual environment.

Immersive Projection Technology andVirtual Environments 2001: proceedings of the Eurographics Workshopin Stuttgart, Germany, May 16-18, 2001 (2001), 187. 3[VMD08] V

IDAL

V., M EI X., D

ECAUDIN

P.: Simple empty-space re-moval for interactive volume rendering.

Journal of Graphics Tools 13 , 2(2008), 21–36. 2, 5[Wal07] W

ALD

I.: On fast construction of SAH-based bounding volumehierarchies. In

Proceedings of the 2007 IEEE Symposium on InteractiveRay Tracing (Washington, DC, USA, 2007), RT ’07, IEEE ComputerSociety, pp. 33–40. 1, 2[Wal19] W

ALD

I.: Computing minima and maxima of subarrays. In

RayTracing Gems: High-Quality and Real-Time Rendering with DXR andOther APIs , Haines E., Akenine-Möller T., (Eds.). Apress, Berkeley, CA,2019, pp. 61–70. 2[WFKH07] W

ALD

I., F

RIEDRICH

H., K

NOLL

A., H

ANSEN

C. D.: Inter-active isosurface ray tracing of time-varying tetrahedral volumes.

IEEETransactions on Visualization and Computer Graphics 13 , 6 (Nov 2007),1727–1734. 2[WH06] W

ALD

I., H

AVRAN

V.: On building fast kd-trees for ray tracing,and on doing that in O(N log N). In

IEEE Symposium on Interactive RayTracing 2006(RT) (09 2006), vol. 00, pp. 61–69. 2[WJA ∗

17] W

ALD

I., J

OHNSON

G., A

MSTUTZ

J., B

ROWNLEE

C.,K

NOLL

A., J

EFFERS

J., GÃ

IJNTHER

J., N

AVRATIL

P.: OSPRay - a CPUray tracing framework for scientiﬁc visualization.

IEEE Transactions onVisualization and Computer Graphics 23 , 1 (Jan 2017), 931–940. 2[WWW ∗

19] W

ANG

F., W

ALD

I., W U Q., U

SHER

W., J

OHNSON

C. R.:CPU Isosurface Ray Tracing of Adaptive Mesh Reﬁnement Data.

IEEETransactions on Visualization and Computer Graphics (2019). 1[ZAL12] Z

ELLMANN

S., A

UMÜLLER

M., L

ANG

U.: Image-Based Re-mote Real-Time Volume Rendering - Decoupling Rendering from ViewPoint Updates. In

Proceedings of the ASME 2012 International DesignEngineering Technical Conferences & Computers and Information in En-gineering Conference (12 -15 Aug. 2012), ASME. 2[ZHL17] Z

ELLMANN

S., H

OEVELS

M., L

ANG

U.: Ray traced volumeclipping using multi-hit BVH traversal. In

Proceedings of Visualizationand Data Analysis (VDA) (2017), IS&T. 2[ZHL19] Z

ELLMANN

S., H

ELLMANN

M., L

ANG

U.: A linear time BVHconstruction algorithm for sparse volumes. In

Proceedings of the 12thIEEE Paciﬁc Visualization Symposium (2019), IEEE. 1, 2[ZML19] Z

ELLMANN

S., M

EURER

D., L

ANG

U.: Hybrid grids forsparse volume rendering. In

IEEE VIS 2019 - Short Papers (2019). 1, 2,3[ZSL18] Z

ELLMANN

S., S

CHULZE

J. P., L

ANG

U.: Rapid k-d tree con-struction for sparse volume data. In

Eurographics Symposium on ParallelGraphics and Visualization (2018), Childs H., Cucchietti F., (Eds.), TheEurographics Association. 1, 2[ZSL19] Z

ELLMANN

S., S

CHULZE

J. P., L

ANG

U.: Binned k-d tree con-struction for sparse volume data on multi-core and GPU systems.

IEEETransactions on Visualization and Computer Graphics (2019), 1–1. 1, 2,3, 6[ZWL17] Z

ELLMANN

S., W

ICKEROTH

D., L

ANG

U.: Visionaray: Across-platform ray tracing template library. In

Proceedings of the 10thWorkshop on Software Engineering and Architectures for Realtime Inter-active Systems (IEEE SEARIS 2017) (in press, 2017), IEEE. 3 ellmann / Empty Space Skipping Comparison9Data Set Naive LBVH CPU Shallow CPU, MLS=32 CPU, MLS=128 GPU Shallow GPU, MLS=32 GPU, MLS=128 Hybridfps build fps build fps build fps build fps build fps build fps build fps build fpsAneurism 163. 0.001 156. 0.031 222. 0.153 263. 0.132 256. 0.003 222. 0.442 263. 0.004 263. 0.031 172.Xmas Tree 97.1 0.004 69. 0.191 103. 0.689 112. 0.450 116. 0.009 98.0 2.163 115. 0.009 116. 0.191 86.2Magnetic 105. 0.004 103. 0.182 192. 0.504 158. 0.326 185. 0.003 192. 1.758 156. 0.006 182. 0.182 141.Beetle 82.7 0.009 164. 0.423 144. 0.910 208. 0.667 227. 0.030 135. 1.985 200. 0.004 233. 0.423 196.Snake 65.4 0.012 133. 0.884 102. 1.987 212. 1.938 196. 0.043 89.3 2.533 213. 0.004 227. 0.884 145.Sponge 55.6 0.029 23.0 1.116 55.6 8.480 42.6 3.713 66.2 0.050 55.6 25.07 37.9 0.018 55.2 1.116 42.6Richtmeyr 12.9 0.252 6.73 35.82 19.9 95.14 10.4 72.91 17.0 0.396 20.0 413.9 7.83 29.80 15.0 35.82 14.3N-Body 265 166. 0.001 204. 0.004 278. 0.069 286. 0.059 294. 0.002 270. 0.142 303. 0.562 303. 0.004 227.N-Body 512 100. 0.003 144. 0.133 185. 0.331 227. 0.276 238. 0.008 175. 1.168 233. 0.562 233. 0.133 164.N-Body 1K 55.0 0.016 122. 1.426 118. 2.627 217. 2.279 200. 0.054 111. 3.637 233. 1.959 189. 1.426 130.N-Body 2K 38.8 0.104 110. 17.07 51.5 24.26 213. 22.53 204. 0.427 84.0 12.28 233. 10.90 175. 17.07 98.0