A comparative evaluation of three volume rendering libraries for the visualization of sheared thermal convection
AA comparative evaluation of three volume rendering libraries for the visualization ofsheared thermal convection
Jean M. Favre a, ∗ , Alexander Blass b a Swiss National Supercomputing Center (CSCS), Via Trevano 131, CH-6900 Lugano, Switzerland b Physics of Fluids Group, Max Planck Center for Complex Fluid Dynamics, J. M. Burgers Center for Fluid Dynamics and MESA + Research Institute, Departmentof Science and Technology, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands
Abstract
Oceans play a big role in the nature of our planet, about 70% of our earth is covered by water [1]. Strong currents are transportingwarm water around the world making life possible, and allowing us to harvest its power producing energy. Yet, oceans also carrya much more deadly side. Floods and tsunamis can easily annihilate whole cities and destroy life in seconds. The earth’s climatesystem is also very much linked to the currents in the ocean due to its large coverage of the earth’s surface, thus, gaining scientificinsights into the mechanisms and e ff ects through simulations is of high importance. Deep ocean currents can be simulated by meansof wall-bounded turbulent flow simulations. To support these very large scale numerical simulations and enable the scientists tointerpret their output, we deploy an interactive visualization framework to study sheared thermal convection. The visualizationsare based on volume rendering of the temperature field. To address the needs of supercomputer users with di ff erent hardwareand software resources, we evaluate di ff erent volume rendering implementations supported in the ParaView [2] environment: twoGPU-based solutions with Kitware’s native volume mapper or NVIDIA’s IndeX library, and a CPU-only Intel OSPRay-basedimplementation. Keywords:
Scientific Visualization, High Performance Computing, Navier-Stokes Solver, Direct Numerical Simulation, Computational FluidDynamics
Figure 1: Snapshot of the three-dimensional temperature field of sheared ther-mal convection at Ra = . × and Re w =
1. Introduction
Thermohaline ocean circulation [4] is vital for the heat bud-get of our earth. Manabe and Stou ff er [5] observed that it cancontribute to a heat increase of up to ∼ ◦ C on the yearlyaveraged mean surface temperatures in the North Atlantic re-gion. Marshall and Schott [6] investigated a vast variety of ∗ [email protected] scales in ocean dynamics and stated that deep convection canbe related to mixing layers everywhere in the ocean. Sincethere are many complex three-dimensional events happening inlarge-scale fluid bodies such as oceans, it is vital to visualizethe three-dimensional and temporal features of such flow simu-lations.We study these large-scale bodies of fluids which are shearedby winds or currents and influenced by temperature di ff erencesin the flow. A fundamental setup of this natural mechanism issheared thermal convection (Fig. 1). Many processes in natureare based on heat and momentum transfer and therefore inter-action between buoyancy [7, 8] and shear [9, 10]. Rayleigh-B´enard convection, the flow in a box heated from below andcooled from above is a paradigmatic system for thermal con-vection. We present the use of three di ff erent rendering librariesavailable in ParaView [2] to build a time-dependent volume ren-dering of thermal convection. The deployment and evaluationof the hardware and software requirements of these librarieswas motivated by a showcase submission at the 2018 Inter-national Conference for High Performance Computing, Net-working, Storage and Analysis. In the accompanying video[11] we are able to display the previously two-dimensionallypresented flow structures in a three-dimensional motion. Thereader is led through a presentation of one specific flow casewith sheared thermal convection and can experience the dy-namics of the thermal structures while being informed about Preprint submitted to Parallel Computing August 27, 2019 a r X i v : . [ phy s i c s . c o m p - ph ] A ug i ff erent flow parameters.
2. Numerical simulations of sheared thermal convection
The direct numerical simulations (DNS) were performedwith the second-order finite-di ff erence code AFiD [12], in whichthe three-dimensional non-dimensional Navier-Stokes equationswith the Boussinesq approximation are solved on a staggeredgrid.We use u = u ( x , t ) as the velocity vector with streamwise,spanwise and wallnormal components. θ is the non-dimensionaltemperature ranging from 0 ≤ θ ≤
1. The simulations are per-formed in a computational box with periodic boundary condi-tions in streamwise and spanwise directions and confined by aheated plate below and a cooled plate on top. The shearing ofthe flow is implemented by a Couette flow setting where bothtop and bottom plates of the flow are moved in opposite direc-tions with the speed u w keeping the average bulk velocity atzero and therefore minimizing dissipation errors. The domainsize is ( L x × L y × L z ) = (9 π h × π h × h ) using a grid of ( n x × n y × n z ) = (6912 × × ff erence Navier-Stokes solver AFiD [12] was written in Fortran 90 to study large-scale wall boundedturbulent flow simulations. In collaboration with NVIDIA, USA,the code was ported in its newest version to a GPU setting usingan MPI and CUDA Fortran hybrid implementation optimized torun and solve large flow fields [13].We used data from Blass et al. [3] for our evaluation of vol-ume rendering implementations, where a parameter study overdi ff erent input parameters was conducted to study their influ-ence on the flow field. Control parameters were the temperaturedi ff erence between the top and bottom plates as the strength ofthe thermal forcing, non-dimensionalized as the Rayleigh num-ber Ra , and the wall velocity as the strength of the shear forcing,non-dimensionalized as the wall shear Reynolds number Re w .In Fig. 2 we present snapshots of temperature fields at midheight in di ff erent flow regimes. It can be observed that theflow passes from a thermally dominated regime with large ther-mal convection rolls driving the flow (Fig. 2a) into a regimewhere the mechanical forcing is dominant. Here, large-scalemeandering structures can be observed which are driven by theshearing of the top and bottom plates (Fig. 2d). To undergo atransition between the regimes, the flow has to pass through anintermediate stage, in which the thermal plumes get stretchedinto large streaks (Fig. 2b). If the shearing is further increased,these streaks become unstable and start meandering in the finalflow state (Fig. 2c,d).The reason for this streaky flow behavior is the additionof a third dimension to originally quasi-two-dimensional flowstructures in pure thermal convection. Such thermal convec-tion rolls are driven solely by the thermal di ff erence betweenthe plates. Once the wall shearing is added, the flow starts tostrongly move in streamwise direction, which causes the devel-opment of streaks. Figure 2: Zoomed snapshots of temperature fields of a sheared and thermallyforced flow transitioning through all flow regimes for Ra = . × and (a) Re w =
0, (b) Re w = Re w = Re w = θ min (blue) to θ max (red). In turbulent flows it is very important to research how cer-tain characteristic parameters are influenced by the flow. Inthermal convection, the heat transfer, non-dimensionally de-fined through the Nusselt number Nu is a good indicator ifchanging flow structures have a supporting e ff ect or may dis-rupt a previously transport-favorable flow situation.While two-dimensional visualizations are very helpful inunderstanding the behavior of the large-scale structures, theydon’t show the complete scientific picture. They give a goodindication of the flow behavior, but to understand thermal tur-bulence, it is vital to see the whole flow field and the dynamicinteraction of turbulent structures with each other. The oppor-tunity to observe the flow evolving and transitioning throughdi ff erent regimes is a great chance to not only statically observedi ff erent flow states at fixed locations in space, but to also ac-tually follow the flow on its path to develop thermal plumes,streaks and meandering structures.It has been previously shown in thermal convection that thelarge thermal plumes can be traced until very close to the heatedand cooled plates [14]. So it is very important to also observethe emergence of structures close to the boundary layer. In theshear dominated regime, which we visualize in the accompany-ing video [11], we can observe extremely large-scale structureswhich are caused by a combination of thermal and shear forc-ing. The detailed visualizations we presented allow us to notonly follow the large-scale structures, but also the interactionof small-scale structures much closer to the plates (Fig. 3).
3. Volume rendering libraries and setup
We use ParaView v5.6.0, a world-class, open source, multi-platform data analysis and visualization application installedon Piz Daint. Piz Daint, a hybrid Cray XC40 / XC50 system,2 igure 3: Zoom of an snapshot of the temperature field (top) and the vorticitystructures (bottom) at Ra = . × and Re w = is the flagship supercomputer of the Swiss National HPC ser-vice. We have deployed and tested several solutions withinParaView where parallelism is expressed at di ff erent degrees:data-parallel visualization pipelines with GPU-based renderingsor multi-threaded parallelism for CPU-based renderings.The computational domain used for our simulations is madeof 6912 × ×
384 grid points. The temperature scalar fieldstored as float32 takes 36 GB of memory, an overwhelmingsize to handle on a normal desktop. Using di ff erent parallelprogramming paradigms has enabled us to provide an engag-ing environment to promote interactive tuning of visualizationoptions and high productivity for movie generation.Visualization of three-dimensional scalar fields is a verymature field. Many techniques are available to make some senseof the three-dimensional nature of the data, and its variationsthroughout the volume. Surface-based renderings with isosur-face thresholds or slicing planes have a great appeal in that theyare easy to use, and provide unambiguous representations basedon clearly defined numerical values. Volume renderings, earlyapplied to medical applications, are also a great fit for scalar vi-sualizations, especially in the realm of time-dependent outputs.They are, however, much more di ffi cult to use. Volume render-ing is based on the principle of converting a 3D scalar field ontoan RGB (color) volume and an Opacity volume. Transfer func-tions, often defined in an ad-hoc manner, convert scalar valuesto colors, and classify the data into regions of di ff erent opaci-ties. A volume can then appear as clouds with varying densityand color. Their interpretation remains subjective to the user’staste and practice. We refer readers to other sources [15] to divemore deeply into the principles of Volume Rendering.Volume Rendering can be implemented in di ff erent man- ners. ParaView was chosen because it o ff ers a testbed for sev-eral state of the art implementations which can be selected basedon rendering parameters and available hardware.The largest partition of the Piz Daint supercomputer hasnodes equipped with one Intel Xeon E5-2690 (12 cores, 64GB RAM) and one NVIDIA Tesla P100 GPU (16 GB RAM,OpenGL driver 396.44). Thus our priority is to evaluate theGPU-based implementations. ParaView’s default installationenables also a software ray caster for rendering volumes but wehave found its performance far below the other options. Thelack of advanced parameter settings in the Graphical User In-terface (GUI) of ParaView also led us to abandon its evalua-tion. We tested ParaView’s native GPU ray casting implemen-tation against IndeX an NVIDIA library, as well as OSPRay, asoftware-based library developed by Intel. Doing so, providesa valid option to users of supercomputers not equipped withGPUs. Our performance evaluation is based on ParaView’sbenchmarking Python source code .We have in all cases ignored disk-based I / O costs. Thereis often quite a bit of variability when running on a large dis-tributed file system shared by hundreds of users. Our motiva-tions are rendering-centered, and two-fold: evaluate the mem-ory cost and resources (CPU, GPU) required to get a first imageon the screen, and see if color / opacity transfer function editing,as well as other image tuning, can be done interactively, usingany of the three methods proposed. In the evaluation of per-formance costs, ParaView’s benchmark code enables fully au-tomated testing with a careful management of double bu ff ering,turning o ff all rendering optimizations designed to accelerateinteractive viewing, and forcing full-feature rendering beforesaving images to disk.In the two GPU-based methods evaluated, we use an EGL-based rendering layer [16] to overcome the need to have a server-side X-Windows server running on the compute node. Thisenables headless, o ff screen rendering with GPU acceleration.We note, however, that although the GPUs provide phenomenalrendering power, they are limited by the available memory (16GB on our NVIDIA’s Pascal GPUs). For the full size of oursimulations outputs, we are actually forced to use data-parallelpipelines on multiple nodes to use the aggregate memory of thedi ff erent GPUs.Our third option, uses Intel OSPRay and CPU rendering.HPC compute nodes usually have more memory than their GPUcounterparts. We use Piz Daint’s high memory nodes with 128GB of RAM, where our grid of over 9 billion voxels can be fiteasily on a single node. When GPU hardware is present, ParaView’s most e ffi cientmapper is a volume mapper that performs ray casting on theGPU using vertex and fragment programs [17]. The core ray-tracing algorithms are coded in GLSL and require a graphicsdriver supporting at least OpenGL version 3.2 [18]. The datais stored into a vtkVolumeTexture which manages the OpenGL source code found in . / Wrapping / Python / paraview / benchmark / igure 4: Comparison between volume renderings of temperature withParaView’s OpenGL GPU RayCastMapper (left), and with NVIDIA IndeX(right). volume texture, its type and internal format. Although this classsupports streaming data into separate blocks to make it fit theGPU memory, we have not used this option which imposesa performance trade-o ff , artificially going over the fixed GPUmemory limit. Block streaming, sometimes called data brick-ing, may also su ff er from artifacts at the block boundaries wheregradient computations are done to support shading. ParaView’sOpenGL VolumeRayCastMapper binds the 32-bit float scalarfield array to a three-dimensional texture image with a call toglTexImage3D(). An explicit texture object is created, transfer-ring data from host memory to GPU memory. The maximumachievable performance will be proportional to the total amountof GPU memory, and to the transfer bandwidth over our highspeed PCIe3 serial bus connecting the host to the GPU device. NVIDIA IndeX [19] is a three-dimensional visualizationSDK developed to enable volume rendering of massive datasets. NVIDIA has worked in tandem with Kitware to bring animplementation of IndeX to ParaView, and we have enjoyedthe benefits of a close partnership between the Swiss NationalSupercomputing Center (CSCS) and NVIDIA, to be able to useIndeX in a multi-GPU setting. We use the ParaView plugin v2.2with the core library NVIDIA IndeX 2.0.1. The NVIDIA In-deX Accelerated Compute (XAC) interface integrates the coresurface and volume sampling programs written in CUDA [20].For this case, we have used the generic programs provided byIndeX, without custom programming. In Fig. 4 we show side-by-side renderings done with the two GPU-based libraries, todemonstrate that they produce equivalent images. The ParaViewGraphical User Interface ensures that both implementations useidentical color and opacity transfer functions and sampling rates.ParaView’s GPU Ray Casting image (left) is used as reference.Di ff erences of illumination are barely noticeable to the humaneye. OSPRay [21] is a ray tracing framework for CPU-based ren-dering. It supports advanced shading e ff ects, large models andnon-polygonal primitives. OSPRay can distribute “bricks” ofdata as well as “tiles” of the framebu ff er, although in our case, we use brick subdivisions only. The Texas Advanced Comput-ing Center has developed a ParaView plugin that enables us totest the possibility of using a ray-tracing based rendering enginefor volumetric rendering. This is the best solution for clusterswhere no GPU hardware is available.OSPRay can use its own internal Message Passing Inter-face (MPI) layer to replicate data across MPI processes andcomposite the image. This would result in linear performancescaling and supports secondary rays used in ParaView’s path-tracer mode, but would be prohibitive in terms of communica-tion costs. In this study, we rely on a di ff erent parallel com-puting paradigm. The emphasis is no more on data parallelism,but rather on multi-threaded execution. A complete software-only ParaView installation was deployed with an LLVM-basedOpenGL Mesa layer. We used Mesa v17.2.8, compiled withLLVM v5.0.0, and the OSPRay v1.7.2 library to provide a verye ffi cient multi-threaded execution path taking advantage of PizDaint’s second partition of compute nodes. These nodes arebuilt with two Intel Broadwell CPUs (2x18 cores and 64 / -- cpus-per-task = -- ntasks-per-core =
2” to e ff ectively take full advantageof the multi-threading exposed by the LLVM and OSPRay li-braries. ParaView’s default mode of parallel computing is to usedata-parallel distribution, whereby sub-pieces of a data grid areprocessed through identical visualization pipelines. To combinethe individual framebu ff ers of each computing nodes, ParaViewuses Sandia National Laboratory’s IceT [22] compositing li-brary. We use it in its default mode of operation doing sort-last compositing for desktop image delivery. We note here thatNVIDIA’s IndeX uses a proprietary compositing library, so forthe IndeX tests only, we disable ParaView’s default image com-positor.
4. Volume rendering of the thermal convection
Figure 5: Example of a color and opacity transfer functions to highlight hotand cold plumes.
In visualizing the temperature field, we seek to highlightthe turbulence which is best shown by clearly di ff erentiating4 igure 6: Volume rendering with shading based on gradient estimation (left),and with OSPRay-enabled shadows (right). between cold and hot regions to see how they interact with eachother, as seen in Fig. 5. Our movie animation shows an initialphase where region of blue tint is superposed on top of the hot-ter region. Plumes emerging from the bottom and mixing intothe cold regions highlight this phenomenon. ff ects When presented with multiple visualizations including dif-ferent illumination and shading, we preferred the renderingswhich emphasize the amorphous nature of the field data. As canbe seen in Fig. 6, shading based on gradient estimation o ff erslittle improvement because our data does not have strong gradi-ents, and the use of shadows which at first might seem more ap-pealing, produces images with a strong surface-like look, whichwe discarded upon further analysis. Volumetric rendering of high resolution grids has a non-significant cost which we briefly document here. Creating thefirst frame after data has been read in memory, i.e., the startupcost has a great impact in having users adopt a particular im-plementation. In a post-hoc visualization, data would be readfrom disk; in an in-situ scenario, data might have to be con-verted to VTK data structures. Thus, we measure performanceafter the time ParaView has collected all the data and createda bounding-box representation. This startup cost for the firstimage is also of paramount importance in a movie-making sce-nario, where data are read from disk, a single image is com-puted, and the whole visualization pipeline and hardware re-sources are flushed to visualize the next timestep.Unlike ParaView’s native GPU ray caster implementationwhich does not enable block streaming, the NVIDIA IndeX li-brary processes data by chunks. However, it does so by bringingvolume sub-extents incrementally into the GPU memory. Earlyvolume chunks are rendered properly as long as the GPU mem-ory is not exhausted. When memory runs out, late chunks actu-ally corrupt the final image. Our attempts to render a 4 billionvoxels dataset on a single node did not succeed with NVIDIAIndeX. We observe failures to allocate 64 voxel cubes and thefinal images are corrupted.We summarize in Table 1 the time from when volumetricrendering options are enabled, triggering the building of inter-nal structures until the first frame appears. In order to measure the memory cost of all three libraries under evaluation on a sin-gle node, we restricted our test sample to a quarter-size domainof the original grid, i.e., 2.28G voxels (1730 × × settles at9.1 GB for ParaView native raycaster, and 12.3 GB for NVIDIAIndeX. Table 1: Initialization and memory costs for a quarter-size domain on onenode.
Rendering library Startup ParaView taskOSPRay 1.34 s 18.4 GBParaView GPU Mapper 6.17 s 27.2 GBNVIDIA IndeX 11.84 s 39.2 GBWe note both a much higher memory consumption on theapplication side of ParaView and on the GPU memory side forthe NVIDIA IndeX implementation. The high initial setup costincurred by the NVIDIA IndeX library is due to higher volumetransfer between CPU and GPU, a cost that increases furtherwhen in parallel, as the current implementation of IndeX trig-gers re-execution of the data I / O due to larger than usual ghostlayer requirements. Work is in progress to minimize this im-pact in a future version of the plugin. If memory costs are substantial, more nodes, and / or moreGPUs will be required, increasing the run-time cost of the vi-sualization. Our data domain is quite large, and we are notable to load a half-size domain on a single GPU node. Indeed,both the 64 GB RAM on the node and the 16 GB RAM onthe GPU are hard limitations. The OSPRay-based CPU ren-dering is one way to alleviate this problem. We can load thefull size domain on a single node of the multi-core partitionof Piz Daint with dual-Xeon chips and 128 GB of RAM. Wemeasured again the startup cost for the first image at full HDresolution (1920x1080 pixels), using 72 execution threads andfound them to increase linearly with grid dimensions. We testedthe quarter-size, half-size and the full domain and report the de-livery of the first image in 1.07, 1.50, and 2.33 s, respectively.The associated cost in RAM is also linear, at 18.4 GB, 36.5 GBand 73 GB, respectively. Of great interest is OSPRay’s manage-ment of memory. OSPRay volumes can be stored in two di ff er-ent manners. The first variant named shared structured volume matches ParaView’s data layout. Version 5.6 of ParaView is thefirst version where this zero-copy access pattern is used and itprovides both a faster startup time and a much lower memoryfootprint, as compared to previous work. Indeed, we reportedearlier on the use of OSPRay’s alternate implementation called block bricked volume whereby data locality in memory is in-creased by arranging voxel data in smaller chunks. This camehowever at a higher cost, doubling the memory footprint on theCPU [23]. GPU memory usage is measured with the nvidia-smi diagnostic tool personal communication with NVIDIA Dev. team ff and we tested therendering speed of that particular mode in a batch productiontest. We created an OSPRay-based benchmark test to mimica navigation fly-through in a full resolution domain, startingfrom an overall view of the full grid, zooming in, rotating theview-point, and finally zooming in to immerse the viewer inthe volume. Our initial view-point has some regions of screen-space empty, where rendering costs at each pixel are negligi-ble. We then move quickly into the scene such that the view-port is completely covered by active pixels, that is, all pixelrays hit the volume. We rendered our benchmark test at threedi ff erent pixel resolution, WXGA (1280x800 pixels), Full HD(1920x1080 pixels) and 4K Ultra HD (3840x2160 pixels), toevaluate the impact of pixel resolution on rendering costs. Wealso evaluated the use of hyper-threading to further boost per-formance. Table 2 summarizes our average rendering time perframe for 300 frames of navigation. Table 2: Average rendering for the full size domain at di ff erent pixel resolution Pixel Resolution vs. ff er resolution to verylarge sizes is not a showstopper. In a post-processing scenario, we have seen that the twoGPU-based rendering solutions are limited by the available GPUmemory, since our 9-billion voxels data set will not fit on asingle GPU. Likewise, in an in-situ scenario, the visualizationwould most likely use a parallel set of nodes. Loading our full-size data, we rounded up our evaluation of all three renderingoptions, by measuring the initial cost for the first image (afterall I / O has been done), and also the average rendering time ina scripted animation loop. Fig. 7 summarizes our results, withthe dataset distributed among 4, 8 and 12 compute nodes.As expected, startup times decrease almost linearly withthe number of compute nodes. For the GPU-based methods,less data is transferred from CPU memory to GPU memory.Our animation benchmark loads a single timestep of data, thus,once the data has migrated to the GPU, there is hardly any CPUto GPU communication apart from a single frame bu ff er im-age. For the CPU-based implementation, the build-up of the Figure 7: Overview of initial cost and average rendering time per frame. ray-tracing acceleration structures takes just over one secondso there is less di ff erence across the few tests executed. Wesee rendering times reduced somewhat linearly since there isless workload. In a movie production setting where all timestepoutputs are read once, rendered once and then discarded, thestartup cost of any rendering library needs to be weighted againstthe I / O costs. Although our data I / O statistics show quite a bitof variation because of the high load of our multi user systemwith over 5000 compute nodes, our simulation data are read, inaverage , in about 32 s (resp. 25, and 16 s) on 4 nodes (resp. 8and 12 nodes). We see that the initialization of the renderingsub-system has a greater impact than expected, and that in an in-situ scenario, it would be the singlemost important barrierto performance. The initialization of the NVIDIA IndeX is themost significant bottleneck. Discussions with NVIDIA are on-going and our hope is that this will be improved in future ver-sions of the SDK since the library is still in early development.We comment here that the parallel execution of the OSPRay-based volume rendering was made possible by using yet anotherParaView mode, letting the OSPRay library take full control ofthe overall scene and parallel frame compositing. Finally, wehighlight the fact that the OSPRay average rendering times perframe in our animation are all under one second, while it takesa minimum of 8 compute nodes using the NVIDIA IndeX so-lution. This level of interactivity can be satisfactory during theprototyping phase of a visualization.
5. Summary and conclusion
We have discussed three implementations of volume ren-dering for a thermal convection simulation output of substantialsize. Our time-dependent output is stored as a float32 array of36GB per timestep. This is a non-trivial size for the most com-mon GPUs. This leaves the scientist with two options: 1) usea data-parallel visualization application with GPU-assisted ren-dering, or 2) use a
CPU-only visualization environment whichcan fit on compute nodes where large memory banks are usu-ally found. Our choice was to deploy a single application, theopen-source ParaView, due to its support for di ff erent parallelexecution paradigms, and for its ability to work with di ff erent6 ff -screen and on-screen rendering backends. Having a singleapplication, driven by fully automatized python scripts and abenchmarking suite of tools available in ParaView itself, en-abled us to confront all possible implementations with reducedvariability.We tested two GPU-based rendering options. We first usedParaView’s native volume rendering which has proved to of-fer the best compromise between startup time, and interactiveperformance; We also tested an alternative solution based on anew library in development by NVIDIA. In our current setup,the IndeX library o ff ers superior interactive rendering, howeverat non-negligible initialization costs.We evaluated an implementation of volume rendering pro-vided by the Intel OSPRay library, a software-based frameworkwhich can take remarkable advantage of a multi-threaded exe-cution layer. This also fits well on a subset of our available hard-ware, a dual-Xeon based compute node without GPU. Our ex-periences are of interest for several computer platforms aroundthe world where graphics hardware is not available.Our emphasis on creating the scientific visualization shownin the accompanying video [11] was two-fold. First, havingan interactive environment enabling us to prototype the visu-alization with large scale data. The editing of color and opac-ity transfer functions is the most demanding step in derivingthe proper visualization, and we were able to provide an in-teractive setup using either GPU-, or CPU-based volume ren-dering. Dealing with long time-dependent simulation outputswas the second requirement, and the path to achieve high pro-ductivity was to use parallel and scalable I / O routines. Weused VTK’s native XML partitioned file format convention forcartesian image data. This was pivotal for a quick turn-aroundtime. The OSPRay-based implementation had the best perfor-mance in both initialization and average rendering time, but suf-fered from some parallel image compositing artifacts at inter-process boundaries. Given the very high spatial resolution ofour grid, these artifacts are only visible at extreme zooming inthe vicinity of ghost-cells between MPI-distributed data. Toconclude and ensure the best visual quality, the compromise formovie production was to use small subsets of GPU nodes withParaView’s native volume renderer.The volume rendering benchmarking platform deployed toanalyze our large grid simulations provides a unique chance toobserve sheared thermal convection in a very simple systemwith far reaching consequences. Furthermore, the visualiza-tions allow us to have a very good first insight into the inter-play between thermal convection and flow shearing by di ff erentkinds of wind and flow currents. We are now able to better un-derstand the emergence and behavior of flow structures trans-porting heat through the system and a ff ecting the flow dynam-ics. Acknowledgments
Alexander Blass was financially supported by the Dutch Or-ganization for Scientific Research (NWO-I) and conducted hissimulations at the Swiss National Supercomputing Center, un-der compute allocations s713, s802, and s874. We acknowl- edge the support from the Dutch national e-infrastructure ofSURFsara, a subsidiary of the SURF cooperation, and the Pri-ority Programme SPP 1881 Turbulent Superstructures of theDeutsche Forschungsgemeinschaft. We thank the ParaView de-velopment team at Kitware, USA, for fruitful discussions andmotivational material. Dave DeMarle has been particularly help-ful in discussion related to the OSPRay plugin. Mahendra Roopaat NVIDIA has also been extremely receptive to our feedbackand instrumental in helping us get the best of the IndeX libraryin a multi-GPU setting. We are grateful to the reviewers of ourmanuscript who provided critical reading and motivated clarifi-cations we have added. We also would like to thank Paul Melisfrom SURFsara for valuable input to our video [11].
References [1] Intergovernmental Panel on Climate Change, Ocean systems, in: ClimateChange 2014 Impacts, Adaptation and Vulnerability: Part A: Global andSectoral Aspects: Working Group II Contribution to the IPCC Fifth As-sessment Report, Chapter 12, 2014, pp. 411–484.[2] J. Ahrens, B. Geveci, C. Law, ParaView: An End-User Tool for LargeData Visualization, Butterworth-Heinemann, 2005.[3] A. Blass, X. Zhu, R. Verzicco, D. Lohse, R. J. A. M. Stevens, Flow or-ganization and heat transfer in turbulent wall sheared thermal convection,Preprint arXiv:1904.11400 (2019).[4] S. Rahmstorf, The thermohaline ocean circulation: A system with dan-gerous thresholds?, Climatic Change 46 (2000) 247–256.[5] S. Manabe, R. J. Stou ff er, Two stable equilibria of a coupled ocean-atmosphere model, J. Climate 1 (1988) 841–866.[6] J. Marshall, F. Schott, Open-ocean convection: Observations, theory, andmodels, Rev. Geophys. 37 (1) (1999) 1–64.[7] G. Ahlers, S. Grossmann, D. Lohse, Heat transfer and large scale dynam-ics in turbulent Rayleigh-B´enard convection, Rev. Mod. Phys. 81 (2009)503.[8] D. Lohse, K.-Q. Xia, Small-scale properties of turbulent Rayleigh-B´enardconvection, Annu. Rev. Fluid Mech. 42 (2010) 335–364.[9] A. J. Smits, B. J. McKeon, I. Marusic, High-Reynolds number wall tur-bulence, Ann. Rev. Fluid Mech. 43 (2011) 353–375.[10] D. Barkley, L. S. Tuckerman, Mean flow of turbulent-laminar patterns inplane Couette flow, J. Fluid Mech. 576 (2007) 109–137.[11] J. M. Favre, A. Blass, Volume renderings of sheared thermal convection[video file] (2018).URL https://youtu.be/yEj83O3hVv4 [12] E. P. van der Poel, R. Ostilla-M´onico, J. Donners, R. Verzicco, A pen-cil distributed finite di ff erence code for strongly turbulent wall-boundedflows, Computers & Fluids 116 (2015) 10–16.[13] X. Zhu, E. Phillips, V. S. Arza, J. Donners, G. Ruetsch, J. Romero,R. Ostilla-M´onico, Y. Yang, D. Lohse, R. Verzicco, M. Fatica, R. J. A. M.Stevens, AFiD-GPU: a versatile Navier-Stokes solver for wall-boundedturbulent flows on GPU clusters, Comput. Phys. Commun. 229 (2018)199–210.[14] R. J. A. M. Stevens, A. Blass, X. Zhu, R. Verzicco, D. Lohse, Turbulentthermal superstructures in Rayleigh-B´enard convection, Phys. Rev. Fluids3 (2018) 041501(R).[15] W. Schroeder and K. Martin and B. Lorensen, The Visualization Toolkit,Kitware, 2006, pp. 213–244.[16] Egl eye: Opengl visualization without an x server, http://tinyurl.com/ybmnzdtv .[17] Volume rendering improvements in vtk, https://blog.kitware.com/volume-rendering-improvements-in-vtk .[18] Shaders in vtk, .[19] Nvidia index, https://developer.nvidia.com/index .[20] R. Haas, P. Mosta, M. Roopa, A. Kuhn, M. Nienhaus, Programmable in-teractive visualization of a core-collapse supernova simulation, in: Con-ference on High Performance Computing Networking, Storage and Anal-ysis, SC 2018, Dallas, TX, USA, 2018.[21] Ospray: a ray tracing based rendering engine for high-fidelity visualiza-tion, .
22] K. Moreland, W. Kendall, T. Peterka, J. Huang, An image compositingsolution at scale, in: Conference on High Performance Computing Net-working, Storage and Analysis, SC 2011, Seattle, WA, USA, 2011, pp.25:1–25:10.[23] J. M. Favre, A. Blass, Volume renderings of sheared thermal convection,in: Conference on High Performance Computing Networking, Storageand Analysis, SC 2018, Dallas, TX, USA, 2018.22] K. Moreland, W. Kendall, T. Peterka, J. Huang, An image compositingsolution at scale, in: Conference on High Performance Computing Net-working, Storage and Analysis, SC 2011, Seattle, WA, USA, 2011, pp.25:1–25:10.[23] J. M. Favre, A. Blass, Volume renderings of sheared thermal convection,in: Conference on High Performance Computing Networking, Storageand Analysis, SC 2018, Dallas, TX, USA, 2018.