[PDF] Efficient Space Skipping and Adaptive Sampling of Unstructured Volumes Using Hardware Accelerated Ray Tracing

Abstract

Sample based ray marching is an effective method for direct volume rendering of unstructured meshes. However, sampling such meshes remains expensive, and strategies to reduce the number of samples taken have received relatively little attention. In this paper, we introduce a method for rendering unstructured meshes using a combination of a coarse spatial acceleration structure and hardware-accelerated ray tracing. Our approach enables efficient empty space skipping and adaptive sampling of unstructured meshes, and outperforms a reference ray marcher by up to 7x.

Full PDF

EEfﬁcient Space Skipping and Adaptive Sampling of Unstructured VolumesUsing Hardware Accelerated Ray Tracing

Nate Morrical Will Usher Ingo Wald Valerio Pascucci SCI Institute, University of Utah NVIDIA (a) (b) (c) (d)

Figure 1: Performance improvement of our method on the 278 million tetrahedra Japan Earthquake data set. (a) A reference volume ray marcherwithout our method, at 0.9 FPS (1024 pixels) on an NVIDIA RTX 8000 GPU. (b) A heat map of relative cost per-pixel in (a). (c) and (d), thesame, but now with our space skipping and adaptive sampling method, running at 7 FPS (7 × faster). A BSTRACT

Sample based ray marching is an effective method for direct vol-ume rendering of unstructured meshes. However, sampling suchmeshes remains expensive, and strategies to reduce the numberof samples taken have received relatively little attention. In thispaper, we introduce a method for rendering unstructured meshesusing a combination of a coarse spatial acceleration structure andhardware-accelerated ray tracing. Our approach enables efﬁcientempty space skipping and adaptive sampling of unstructured meshes,and outperforms a reference ray marcher by up to 7 × . Keywords:

Volume rendering, space skipping, adaptive sampling

NTRODUCTION

Direct volume rendering (DVR) is widely used in the scientiﬁcvisualization community, enabling scientists to interactively exploretheir data and form hypotheses. A standard approach for DVR israycasting, where rays are cast through the volume for each pixeland the color and opacity of the volume is sampled and integratedalong each ray to compute an image of the volume. Interactive raycasting techniques have been demonstrated for both structured [15,23] and unstructured [21, 22, 25, 32, 34] volumes, and map well tothe parallel hardware available on modern CPUs [23, 25, 31] andGPUs [15, 21, 22, 32, 34].However, when the volume becomes expensive to sample the costper-ray increases, limiting interactivity. For unstructured data, thecost of these samples can be reduced, as demonstrated by Wald etal. [32], by leveraging the ray tracing cores available on NVIDIA’sTuring GPUs. When used in a naive volume raycaster, their ap-proach improved frame rates by 1 . × to 3 . × . To further improveperformance, the number of samples taken per-ray must be reduced.Numerous methods have been proposed for regular grid volumes,which roughly fall into two categories: empty-space skipping, whichavoids sampling fully transparent regions; and adaptive sampling,which takes fewer samples in regions containing less interestingdata values. Prior work has employed a range of acceleration struc-tures to enable these optimizations, e.g., macrocells [15, 23], oc-trees [1, 11, 16, 17, 26, 35], KD-trees [29, 30] and BVHs [14].When considering an additional acceleration structure for DVR,the performance overhead introduced by building and traversing the structure is of key concern. If the overhead incurred by the structureis too high, it can overshadow the performance gained from the re-duced number of samples taken. To reduce this overhead, Hadwigeret al. [13] proposed SparseLeap, which leverages triangle rasteri-zation hardware to compute per-pixel lists of active ray segmentsby rendering occupancy geometry. These segments act as the spaceskipping acceleration structure in a subsequent render pass. Ganteret al. [8] recently extended SparseLeap to leverage OptiX [24] andNVIDIA’s ray tracing cores to improve acceleration structure buildtime and use hardware accelerated BVH traversal to reduce overhead.However, both methods must rebuild the structure on transfer func-tion changes, do not consider adaptive sampling, and rely on eitheran octree or a summed area table to build the occupancy geometry,which can result in poor adaptivity to an unstructured mesh.In the context of unstructured meshes, relatively little prior workhas investigated object space adaptive sampling or empty space skip-ping. The bulk of methods for rendering such volumes focus onrasterization methods [3,5,19,20,27], requiring either dynamic level-of-detail strategies [4, 6] or a volume simpliﬁcation pre-processingstep [9, 18, 28] to achieve adaptive sampling. Standard approachesfor ray tracing such data [2, 10, 12, 22] step from cell-to-cell alongthe ray, providing an accurate image at signiﬁcant cost. Nelson etal. [22] skip individual empty mesh elements, but do not supportskipping larger regions at once. Prior work has also proposed raster-izing proxy meshes to perform ray tracing on the GPU [21, 34], butencounter similar adaptive sampling challenges with regard to whichproxy geometry to dispatch. Sample based ray casting methods [25]have shown performance improvements over both rasterization andcell iteration methods, though have not investigated empty spaceskipping or adaptive sampling.In this work, we propose new strategies for empty space skippingand adaptive sampling for sample-based raycasting of unstructuredmeshes. In contrast to prior work, our empty space skipping struc-ture allows skipping larger regions of space and adapts well tothe underlying mesh. Moreover, we formulate our space skippingstructure in a manner suitable for hardware acceleration, withoutrequiring a rasterizer. Finally, we propose an intuitive adaptive sam-pling approach for occupancy geometry methods without requiringadditional preprocessing. Our contributions are:• An extension of “occupancy geometry” to unstructured data;• A lightweight empty space skipping method for unstructureddata, which leverages GPU ray tracing hardware; and• An adaptive sampling approach which provides a bound onerror using three intuitive parameters. a r X i v : . [ c s . G R ] A ugug

ETHOD

Unstructured meshes pose unique challenges which must be over-come to implement an effective empty space skipping and adaptivesampling strategy. Such meshes are often reﬁned by the simulationin areas of interest, and the method must adapt to this non-uniformity,where the size and distribution of the elements is not known a priori.Furthermore, as the mesh is not guaranteed to be convex, the methodmust account for three categories of coarser regions: unoccupiedregions containing no elements, regions containing entirely trans-parent elements, and regions where the data value does not varysigniﬁcantly across the elements.In our approach, we ﬁrst partition the volume into a set of convex,disjoint regions (Section 2.1), which are shrunk to tightly ﬁt thecontained mesh elements (Section 2.2). We then compute metadataabout the elements contained in these partitions, which we use toguide both the empty-space skipping (Section 2.3) and adaptivesampling (Section 2.4). Finally, we use the Turing GPU’s new raytracing hardware to accelerate traversal of these partitions for emptyspace skipping, and our adaptive sampling method to reduce thenumber of samples taken in each partition. An overview of ourmethod is shown in Figure 2.

To partition the mesh elements, we use the leaves of a spatial KD-tree(Figure 2b), which are convex, disjoint, and adaptive. The disjoint,non-overlapping property ensures that a ray will exit one partitionbefore entering the next, and the convexity property ensures thatthe ray will enter each partition only once. As the elements in themesh are unlikely to be uniform in size or distribution, the adaptivityin partition size provided by the KD-tree is desirable over a moreﬁxed structure (e.g., grids [8], octrees [13]) to ensure a roughly evendistribution of rendering cost for the generated partitions. We notethat it is possible for a tetrahedron (or other mesh element) to appearin more than one leaf node, and thus partition. In this case, raysentering a given partition will only sample the portion of the elementcontained inside the partition’s bounds.For each partition we then compute and store the scalar ﬁeldvalue range for the contained elements. For those elements whichare only partly contained in the partition we still include the valuerange of the entire element for simplicity. When the transfer func-tion is changed we apply it across the stored range of ﬁeld valuescontained in the partition, to determine the maximum opacity andcolor variance of the partition. The complexity of the opacity andvariance computation is linear in the number of values in the transferfunction, and is independent of the number of elements contained inthe partition. As there are likely far fewer values in the transfer func-tion than elements in the mesh, this provides better responsivenessto user changes to the transfer function. The variance computationis parallelized across the partitions, allowing for faster updates. Theper-partition variance values are then normalized relative to the mini-mum and maximum variances over all partitions, to allow computingconsistent per-partition sampling rates.

Although the KD-tree leaves provide the desired convex and disjointpartitioning of the volume, the bounds of a leaf do not necessarilytightly bound the contained elements (see Figure 2b). As such, theinitially computed partitions may contain large regions of unoccu-pied space, especially in the case of non-convex, hollow, or non-gridaligned meshes. To provide tighter bounds, we shrink the bound-ing box of each partition to ﬁt the bounding box of the containedelements (Figure 2c). The shrunk bounds is the intersection of thebounds of the contained elements and the original leaf node bounds,to ensure it does not expand out of the original leaf node.

Given the bounds of the partitions, we can leverage hardware acceler-ated ray tracing to accelerate intersection tests against the partitionsto ﬁnd ray entry and exit points. The RT cores available in TuringGPUs support both hardware accelerated BVH traversal and ray-triangle intersection tests. To fully utilize the available hardware,we ﬁrst tessellate the partition bounding boxes, then construct anOptiX [24] BVH over the generated triangles. This BVH need onlybe rebuilt when the underlying partition geometry changes, and isnot tied to the transfer function.To ﬁnd the entry and exit points of the ray in some partition weuse OptiX to trace rays against the partition geometry; ﬁrst withback-face culling enabled to ﬁnd the entry point, then from the entrypoint with front-face culling enabled to ﬁnd exit point. The t rangealong the ray between t enter and t exit is thus the range to integrate thevolume over to sample the partition. If the ray intersects a completelytransparent partition it is skipped and we advance the ray to ﬁnd thenext partition. If no partition is found, the ray is terminated and thecomputed color and opacity is composited with the background andwritten to the framebuffer.To advance to the next partition we set the ray’s t min to t exit − ε .We apply a small offset back by ε to allow for intersection withpotentially coplanar partition boundary faces. From this new startpoint we then ﬁnd the ray’s entry and exit points with the nextpartition as before. When ray marching through a partition, we want to reduce thenumber of samples taken to better match the local partition variance.In partitions with relatively similar colors, the variance is low, and acorrespondingly low number of samples can be taken. Regions withwider color variation require more samples to preserve accuracy.To adaptively sample each partition we use the transfer functionvariance for the partition computed in Section 2.1 to select thestep size for the ray marching process. The step size is computedusing an intuitive equation which allows users to place an upperbound on the tolerable maximum step size, and thus error; andcontrol how quickly the algorithm transitions from high to lowquality sampling based on the partition’s variance. Given a minimumstep size s to use for the highest quality sampling and the maximumtep size s to use for the lowest quality sampling, we computethe step size for a partition with normalized variance σ using s = max ( s + ( s − s ) | min ( σ , ) − | p , s ) .The ﬁnal user controllable parameter is p , referred to as the adap-tive power , which allows the user to tune how quickly the algorithmwill transition to lower quality sampling in medium variance parti-tions. We restrict that p ≥

1, as this lower bound corresponds to asimple linear interpolation between s and s .With this equation the user can easily tune the sampling qualityto produce an acceptable image at some desired frame rate. If theuser wants a lower quality image at a higher frame rate they canincrease s and s , or for a more expensive but higher-quality image,decrease both values. If desired, the adaptive sampling can bedisabled entirely by setting s = s . If the user ﬁnds too few samplesare taken in partitions with medium variance they can increase p tobias s towards s in these partitions. Similarly, to improve frame rateat the cost of error in medium variance partitions p can be decreased,to bias s towards s .Given the sampling step size for a partition we integrate the rayfront to back through the partition, using the rtx-shared-faces point query kernel described by Wald et al. [32] to sample at pointsalong the ray. To ensure correct opacity when compositing parti-tions integrated at different step sizes we use an opacity correctionterm [7]. Given the current sample’s opacity α we compute thecorrected opacity as ˜ α = − ( − α ) s / s . Finally, we perform earlyray termination if the ray becomes opaque. VALUATION

We evaluate our approach using four tetrahedral mesh volumes (Fig-ure 3) covering a range of data set sizes, on an NVIDIA RTX 8000GPU, primarily due to its large memory capacity. Our renderer isimplemented using OptiX 6 and CUDA 10. For the Jets, AgulhasCurrent and Japan Earthquake datasets we set p =

2, on the Deep Wa-ter Asteroid Impact we set p =

6. We ﬁrst evaluate the performancegains provided by our empty space skipping method (Section 3.1),after which we combine it with our adaptive sampling method andevaluate the two in combination (Section 3.2). Finally, we measurethe overhead incurred by the two methods in Section 3.3.

Empty space skipping is able to reduce the samples taken per-ray intwo ways: by skipping regions outside the volume, and by skipping100% transparent partitions. These regions can be skipped, sincethey do not contribute to the ﬁnal image (Figure 5b).With regard to the former, we observe a negligible performanceimprovement when only skipping unoccupied space compared tonaively taking samples potentially outside the volume. The volumeswe conduct our evaluation on are relatively dense, providing littleunoccupied space to skip in the ﬁrst place. Moreover, it is likely thatthe hardware accelerated BVH traversal performed when queryinga point is able to quickly determine the point is outside the volumeand terminate, incurring little cost per-sample.The performance improvement provided in the latter case, skip-ping 100% transparent partitions, is highly dependent on the transferfunction chosen by the user (Figure 6). When using a relatively “bi-nary” transfer function, where background regions are made entirelytransparent, a large number of partitions can be discarded, yielding asigniﬁcant performance improvement. However, if these backgroundregions are made slightly opaque it is no longer possible to discardthem, and relatively few empty partitions can be skipped, therebylimiting the performance improvement which can be achieved.

Our adaptive sampling approach can provide signiﬁcant perfor-mance improvements by reducing samples taken in low variancepartitions (Figure 5c), with little sacriﬁce in image quality. When

Jets, 12M tets (vertex centered data) (a)

Reference, 4.8 FPS (b)

Adaptive, 16.7 FPS (c)

SSIM, .997

Agulhas Current, 35M tets (cell centered data) (d)

Reference, 14 FPS (e)

Adaptive, 48 FPS (f)

SSIM, .98

Japan Earthquake, 278M tets (vertex centered data) (g)

Reference, 0.9 FPS (h)

Adaptive, 7 FPS (i)

SSIM, .98

Deep Water Asteroid Impact, 366M tets (cell centered data) (j)

Reference, 4 FPS (k)

Adaptive, 14 FPS (l)

SSIM, .97

Figure 3: Quality and performance comparisons of our method against areference volume ray marcher [32], using representative views and transferfunctions. For comparable image quality our method performs roughly 3 − × faster, achieving its greatest speedup in the most irregular data set (JapanEarthquake). More aggressive quality settings can yield higher speedups, atthe cost of image quality. For larger images, please see the supplementalmaterial. combined with our empty space skipping approach, adaptive sam-pling improves performance in semitransparent low-variance regions(see Figure 6). For high-quality rendering using both methods incombination we ﬁnd signiﬁcant performance improvements. Onthe Jets, Agulhas Current and Deep Water Asteroid Impact weachieve a roughly 3.5 × improvement in rendering performance;on the Japan Earthquake we achieve a 7.8 × improvement; in allcases SSIM ≥ s is increased.As s is increased, the adaptive sampling can take larger steps in lowand medium variance regions, reducing samples taken and therebyimproving performance, at the cost of image quality. At the extremeend we ﬁnd that even when taking 1 / e t s A gu l ha s C u rr en t J apan E a r t hqua k e D eep W a t e r A s t e r o i d I m pa c t − − − − F r a m e s P e r S e c ond A v e r age S a m p l e s P e r R a y S t r u c t u r a l S i m il a r i t y ( SS I M ) Max Adaptive Sampling RateMax Adaptive Sampling Rate Max Adaptive Sampling RateMax Adaptive Sampling Rate

Figure 4: The effect of increasing the maximum step size (tolerable error) on rendering performance, samples taken and image quality for each data set. Asexpected, image quality decreases as the maximum step size is increased, though remains high-quality (SSIM ≥ (a) Reference (b)

Space skipping only. (c)

Plus adapt. sampling.

Figure 5: A heatmap of the samples taken per-pixel compared to the reference(a), when using (b) just space skipping, and (c) space skipping plus adaptivesampling. (a) The reference takes a large number of samples for most pixels,except a few where early ray termination occurs. (b) Space skipping avoidsunoccupied and fully transparent partitions, though takes many samples invisible partitions. (c) Adaptive sampling reduces samples taken in visible low-variance regions, reducing samples while providing similar image quality. number of samples taken is likely to be even greater.

In our evaluation we found the added work of intersecting rays withthe partition boundaries is negligible. With relatively few generatedpartitions, even for large data sets, and hardware accelerated raytracing used to intersect these partitions, there is little cost incurredto traverse them. For example, on the Japan Earthquake, whichcontains the largest number of partitions (4725), we ﬁnd tracing raysthrough these partitions takes just 3ms.

IMITATIONS , F

UTURE W ORK , AND C ONCLUSION

We have presented new strategies for leveraging empty space skip-ping and adaptive sampling in the context of volume renderingunstructured meshes. Our method signiﬁcantly reduces the numberof samples required per-pixel, improving performance without sac-riﬁcing image quality. Our adaptive sampling method exposes anintuitive set of parameters to users, allowing them to easily controlthe trade off between performance and image quality. Furthermore,our approach is able to leverage the new ray tracing hardware avail-able on recent GPUs and incurs little overhead as a result.Although we have demonstrated signiﬁcant performance improve-ments when using our method, it is not without limitations. First,as with other adaptive sampling approaches we ﬁnd diminishingreturns as the sampling rate becomes very low. Second, our adaptivesampling is only based on the transfer function, and would requireadditional metadata per partition to account for gradient shading orscattering. Third, our occupancy geometry must be rebuilt on meshgeometry changes, which would impact performance on both timeseries data as well as runtime level-of-detail strategies. On largedatasets such as the Deep Water Asteroid Impact, we encounterednumerical precision issues when traversing the partitions. These

Reference: 4 FPS Reference: 4 FPSSpace skipping only: 20 FPS Space skipping only: 5 FPSAdaptive only: 10 FPS Adaptive only: 12 FPSTogether: 24 FPS Together: 13 FPS

Figure 6: Empty space skipping works best for “binary” transfer functions,where parts of the volume are 100% transparent (left); but on its own breaksdown if regions are not 100% transparent (right). Adaptive sampling is ableto reduce the sampling rate in these semitransparent low-variance regions,improving performance for such cases. precision issues may be addressed by adapting our epsilon offset toaccount for the data set size, or using a custom higher-precision inter-section test for the partition geometry. Finally, although the mediansplit KD-tree provides a reasonable spatial partitioning, taking intoaccount the underlying scalar ﬁeld may provide better results. Forexample, on the Deep Water Asteroid Impact the water is split intomultiple partitions; however, the ﬁeld is uniform across the regionand a single partition would sufﬁce.In future work, it would be valuable to explore a method forautomatically selecting the adaptive sampling parameters basedon some runtime reﬁnement to provide a desired image quality,instead of requiring users to manually tune the sampling parameters.Although we have evaluated our method in a GPU raycaster toleverage hardware accelerated ray tracing, it could translate wellonto the CPU using Embree [33]. While we have only evaluatedour method on linearly interpolated tetrahedral meshes, a similarapproach may work well for other unstructured volumes, adaptivemesh reﬁnement (AMR) volumes, and higher order interpolants. A CKNOWLEDGMENTS

The Agulhas is courtesy Dr. Niklas Röber (DKRZ); the JapanEarthquake is courtesy of Carsten Burstedde, Omar Ghattas, JamesR. Martin, Georg Stadler, and Lucas C. Wilcox (ICES, the UTAustin) and Paul Navrátil and Greg Abram (TACC); the Deep WaterAsteroid Impact is courtesy of John Patchett and Galen Gisler ofLANL. Hardware for development and testing was graciously pro-vided by NVIDIA Corp. This work is supported in part by NSF:CGV Award: 1314896, NSF:IIP Award: 1602127, NSF:ACI Award:1649923, DOE/SciDAC DESC0007446, CCMSC DE-NA0002375and NSF:OAC Award: 1842042.

EFERENCES [1] I. Boada, I. Navazo, and R. Scopigno. Multiresolution Volume Visual-ization with a Texture-Based Octree.

The Visual Computer , 2001.[2] P. Bunyk, A. Kaufman, and C. T. Silva. Simple, Fast, and RobustRay Casting of Irregular Grids. In

Scientiﬁc Visualization Conference(Dagstuhl ’97) , 1997.[3] S. Callahan, M. Ikits, J. Comba, and C. Silva. Hardware-Assisted Visi-bility Sorting for Unstructured Volume Rendering.

IEEE Transactionson Visualization and Computer Graphics , 2005.[4] S. P. Callahan. The k-buffer and its applications to volume rendering,2005.[5] S. P. Callahan. Adaptive visualization of dynamic unstructured meshes,2008.[6] R. Cook, N. Max, C. T. Silva, and P. L. Williams. Image-space visibilityordering for cell projection volume rendering of unstructured data.

IEEE Transactions on Visualization and Computer Graphics , Nov2004.[7] K. Engel, M. Hadwiger, J. Kniss, C. Rezk-Salama, and D. Weiskopf.

Real-Time Volume Graphics . 2006.[8] D. Ganter and M. Manzke. An Analysis of Region Clustered BVHVolume Rendering on GPU.

Computer Graphics Forum , 2019.[9] M. Garland and Y. Zhou. Quadric-based simpliﬁcation in any dimen-sion.

ACM Trans. Graph. , 2005.[10] M. P. Garrity. Raytracing irregular volume data.

ACM SIGGRAPHComputer Graphics , 1990.[11] E. Gobbetti, F. Marton, and J. A. Iglesias Guitián. A Single-PassGPU Ray Casting Framework for Interactive Out-of-Core Renderingof Massive Volumetric Datasets.

The Visual Computer , 2008.[12] G. Gu and D. Kim. Accurate and Memory-Efﬁcient GPU Ray-Casting Algorithm for Volume Rendering Unstructured Grid Data.In J. Madeiras Pereira and R. G. Raidou, eds.,

EuroVis 2019 - Posters ,2019.[13] M. Hadwiger, A. K. Al-Awami, J. Beyer, M. Agus, and H. Pﬁster.SparseLeap: Efﬁcient empty space skipping for large-scale volumerendering.

IEEE Transactions on Visualization and Computer Graphics ,2017.[14] A. Knoll, S. Thelen, I. Wald, C. D. Hansen, H. Hagen, and M. E. Papka.Full-resolution interactive CPU volume rendering with coherent BVHtraversal. In

Visualization Symposium (PaciﬁcVis) , 2011.[15] J. Krüger and R. Westermann. Acceleration techniques for GPU-basedvolume rendering. In

Proceedings of the 14th IEEE Visualization 2003(VIS ‘03) , 2003.[16] M. Labschütz, S. Bruckner, M. E. Gröller, M. Hadwiger, and P. Rautek.JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.

IEEE Transactions on Visualization and Computer Graphics , 2016.[17] E. LaMar, B. Hamann, and K. I. Joy. Multiresolution Techniques forInteractive Texture-Based Volume Visualization. In

VIS ’99 Proceed-ings of the Conference on Visualization ’99: Celebrating Ten Years ,1999.[18] J. Leven, J. Corso, J. Cohen, and S. Kumar. Interactive visualizationof unstructured grids using hierarchical 3d textures. In

Symposium onVolume Visualization and Graphics, 2002. Proceedings. IEEE / ACMSIGGRAPH , Oct 2002.[19] A. Maximo, R. Marroquim, and R. Farias. Hardware-Assisted Pro-jected Tetrahedra.

Computer Graphics Forum , 2010.[20] K. Moreland and E. Angel. A fast high accuracy volume renderer forunstructured data. In , 2004.[21] P. Muigg, M. Hadwiger, H. Doleisch, and E. Gröller. InteractiveVolume Visualization of General Polyhedral Grids.

IEEE Transactionson Visualization and Computer Graphics , 2011.[22] B. Nelson and R. M. Kirby. Ray-tracing polymorphic multidomainspectral/hp elements for isosurface rendering.

IEEE Transactions onVisualization and Computer Graphics , 2006.[23] S. Parker, M. Parker, Y. Livnat, P.-P. Sloan, and C. Hansen. Interac-tive Ray Tracing for Volume Visualization.

IEEE Transactions onVisualization and Computer Graphics , 1999.[24] S. G. Parker, J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Lue-bke, D. McAllister, M. McGuire, K. Morley, and A. Robison. OptiX: A General Purpose Ray Tracing Engine.

ACM Transactions on Graphics(Proceedings of ACM SIGGRAPH) , 2010.[25] B. Rathke, I. Wald, K. Chiu, and C. Brownlee. SIMD Parallel Ray Trac-ing of Homogeneous Polyhedral Grids. In

Eurographics Symposiumon Parallel Graphics and Visualization , 2015.[26] F. Reichl, M. Treib, and R. Westermann. Visualization of Big SPHSimulations via Compressed Octree Grids. In , 2013.[27] P. Shirley and A. Tuchman. A Polygonal Approximation to DirectScalar Volume Rendering. In

Proceedings of the 1990 Workshop onVolume Visualization , 1990.[28] C. Silva, J. Comba, S. P. Callahan, and F. F. Bernardon. A survey ofgpu-based volume rendering of unstructured grids.

RITA , 12, 01 2005.[29] K. R. Subramanian and D. S. Fussell. Applying space subdivisiontechniques to volume rendering. In

VIS ‘90 Proceedings of the 1stConference on Visualization , 1990.[30] V. Vidal, X. Mei, and P. Decaudin. Simple Empty-Space Removal forInteractive Volume Rendering.

Journal of Graphics Tools , 2008.[31] I. Wald, G. P. Johnson, J. Amstutz, C. Brownlee, A. Knoll, J. Jeffers,J. Günther, and P. Navrátil. OSPRay – A CPU Ray Tracing Frameworkfor Scientiﬁc Visualization.

IEEE Transactions on Visualization andComputer Graphics , 2017.[32] I. Wald, W. Usher, N. Morrical, L. Lediaev, and V. Pas-cucci. RTX Beyond Ray Tracing: Exploring the Use of Hard-ware Ray Tracing Cores for Tet-Mesh Point Location. In

Proceedings of High Performance Graphics , 2019. (To Ap-pear), .[33] I. Wald, S. Woop, C. Benthin, G. S. Johnson, and M. Ernst. Embree: AKernel Framework for Efﬁcient CPU Ray Tracing.

ACM Transactionson Graphics , 2014.[34] M. Weiler, M. Kraus, M. Merz, and T. Ertl. Hardware-based ray castingfor tetrahedral meshes. In

Proceedings of the 14th IEEE Visualization2003 (VIS’03) , 2003.[35] K. Zimmermann, R. Westermann, T. Ertl, C. Hansen, and M. Weiler.Level-of-Detail Volume Rendering via 3D Textures. In , 2000. ets, 12M tets (vertex centered data) (a)

Reference, 4.8 FPS (b)

Adaptive, 16.7 FPS (c)

SSIM, .997Agulhas Current, 35M tets (cell centered data) (d)

Reference, 14 FPS (e)

Adaptive, 48 FPS (f)

SSIM, .98Japan Earthquake, 278M tets (vertex centered data) (g)

Reference, 0.9 FPS (h)

Adaptive, 7 FPS (i)

SSIM, .98Deep Water Asteroid Impact, 366M tets (cell centered data) (j)

Reference, 4 FPS (k)

Adaptive, 14 FPS (l)

SSIM, .97 a r X i v : . [ c s . G R ] A ugug