Depth-dependent Parallel Visualization with 3D Stylized Dense Tubes
DDepth-dependent Parallel Visualization with 3D Stylized Dense Tubes
Haipeng Cai, Jian Chen and Alexander P. Auchus A BSTRACT
We present a parallel visualization algorithm for the illustrative ren-dering of depth-dependent stylized dense tube data at interactiveframe rates. While this computation could be efficiently performedon a GPU device, we target a parallel framework to enable it to beefficiently running on an ordinary multi-core CPU platform whichis much more available than GPUs for common users. Our ap-proach is to map the depth information in each tube onto each ofthe visual dimensions of shape, color, texture, value, and size onthe basis of Bertin’s semiology theory. The purpose is to enablemore legible displays in the dense tube environments. A majorcontribution of our work is an efficient and effective parallel depth-ordering algorithm that makes use of the message passing inter-face (MPI) with VTK. We evaluated our framework with visual-izations of depth-stylized tubes derived from 3D diffusion tensorMRI data by comparing its efficiency with several other alternativeparallelization platforms running the same computations. As ourresults show, the parallelization framework we proposed can effi-ciently render highly dense 3D data sets like the tube data and thusis useful as a complement to parallel visualization environmentsthat rely on GPUs.
Keywords:
Parallel visualization, stylized rendering, MPI, densedata, MRI
Index Terms:
Computer Graphics [I.3.1]: Parallel processing—[Computer Graphics]: I.3.4—Graphics Utilities
NTRODUCTION
When visualizing large-scale geometrical data such as dense tubes,one of the critical issues is the visual perception in the depth dimen-sion due to inherent clutters or occlusions as the result of overlap-ping graphical signs or structures. In order to improve the overallvisual legibility from the prospective of depth perception for three-dimension data, mapping depth information to various visual vari-ables of graphical representations can be effective means for en-hancing depth perceptions in the dense data visualizations.On the basis of Bertin’s semiology theory [10], the primary vi-sual variables to be mapped could include size, color, value andtransparency, etc. For instance, a linear mapping from per-vertexdepth value to the tube radius (i.e. the size) in the visualization ofdense 3D tubes will give viewers a visual cue for discerning depthpositions when the radii are gradually decreasing along the viewingdirection. Similarly, a consistent mapping from depth to color pro-vides a constant correspondence between distance of geometry andcolor value thus helps viewers to orient along the depth dimension.In either case, better depth perception is conducive to improvementin overall legibility of the visualized data.To make the depth-dependent visualizations interactive, real-time computation involved in the depth mappings is required. Thereare two essential compute-intensive steps to be executed every timethe depth reordering is needed, for example when the data viewis rotated. First, depth values are calculated according to the up-dated viewing direction and then sorted along that direction. Sec-ond, mappings are computed and then the data are rendered over again to update the visualization. Concisely, depth sorting and re-rendering should be performed once depth order is shuffled as typ-ical result of data transformation that changes the order. In orderto obtain an interactive frame rate of such visualizations, therefore,these computations are required to finish in real-time, which havebeen proven difficult to achieve by our tests, however, with eithersequential method or by direct use of currently available facilitiessuch as VTK with its parallelism support.With this performance challenge, it is reasonable to consider par-allelization of the depth-dependent visualization described aboveto make it interactive. While GPUs are being increasingly appliedin many modern parallel computations, and indeed, visualizationsof large-scale dense geometry data could be a perfect fit for GPUcomputing platforms, we aim at a cheaper solution to the same chal-lenges. In particular, we target a solution that can be a useful com-plement to the GPU computing paradigm when the GPU devicesand related high-end hardware configurations are not readily avail-able. In fact, this is mostly true since GPUs are generally muchmore expensive than ordinary computer users would like to affordfor the tasks like the dense tube data visualizations we discuss inthis paper, which can indeed be completed with cheap PC hardwareusing our parallelization framework.This paper describes a parallel visualization method that sup-ports real-time computations for interactive depth mappings by us-ing the message passing interface (MPI) with VTK while extendingcurrent VTK facilities for the purpose of performance optimization.Through the optimized coordination between a parallel depth order-ing algorithm and parallel rendering method plus customized datastructures for real-time depth mappings, our approach has been ver-ified to be efficient in the visualization scenarios we described byits application to 3D dense tube data.We have applied the method proposed to depth-dependent 3Ddense tube visualizations with depth mappings to all the primary vi-sual variables mentioned before and have obtained interactive ren-dering speed with either single variable mapping applied or multi-ple variable mappings freely combined. It is noticeable that evenwith the combination of mappings from depth to size and those toany other visual variables, in which two passes of depth sorting andrendering plus tube generation from polylines are all required foreach frame, our approach has still been able to render the densedata sets at interactive frame rates.
ELATED W ORK
In this section, we describe previous work most relevant to our par-allel visualization method. Related past work can be classified intotwo categories: depth enhancement and parallel visualization.
There have been many previous work dedicating to technical so-lutions to the depth perception issues and visual occlusions in 3Ddata visualizations. To name a few, a rich set of landmarks and con-text cues [16] and shading and transparency [9] both contributein enhancing visual perception in the depth dimension while allevi-ating occlusion problems within overlapping structures. Focusingon strengthening depth perception, Bruckner et al. employ volu-metric halos to improve the 3D legibility of visualized volume data[3]. They introduce different halos according to different ways of a r X i v : . [ c s . D C ] O c t alo-volume combination and use halos to construct inconsistentlighting, which accentuates depth even further from another aspect.Elmqvist et al. [7] give an ever complete discussion about oc-clusion management in 3D visualization where they focused on re-ducing 3D occlusions. Occlusion management for visualization isa more general class of visibility problem in computer graphics,which is concerned with improving human perception for special-ized visual tasks such as occlusion, size and shape. This method ex-tensively helped improve the legibility of 3D data visualizations. Incontrast, we investigate how to manipulate typical retinal variablesin graphics perception to help achieve a better depth legibility.Even direct volume rendering techniques often suffer from poordepth cues because the data sets commonly have a large numberof overlapping structures. With MIP (maximum intensity projec-tion) rendering [11], however, only few effort is required to createa good understanding of the structures represented by high signalintensities. This algorithm adds two different visual cues, occlu-sion revealing and depth based color. In the first one, they modifythe MIP color in the presence of occluding objects with the samematerials than the one at the point of maximum intensity while inother the actual position of the shaded fragment is used to changeits color using a supporting spherical map. In this paper, we exploredepth enhancement in dense geometry visualizations by encodingdepth information with various visual variables.Ritter et al. [22] employ hatching strokes to communicate shapewhile using distance-encoded shadow to further enhance depth per-ception in their vascular structure visualization. In addition, theyachieve a real-time performance using GPU-based hatching algo-rithm, which is efficient for rendering complex tabular structureswith depth being emphasized. Similarly, we handle tabular shapesin our visualization scenario but intend to improve depth percep-tion in a much dense 3D tube geometries derived from human brainMRI data. Also, we we are to provide a cheaper interactive render-ing solution on common multi-core CPU than the GPU renderingthey have employed. Parallelization has been extensively harnessed in visualization sce-narios where performance becomes a challenge. In [1], the au-thors developed a scalable and portable parallel visualization sys-tem based on augmenting VTK for efficiently visualizing largescale time-varying data. The system they proposed provides par-allelism on both task and pipeline level and primary addressed tovisualization programmers. Also at a system scale but even ear-lier, SCIRun [12] had offered task and data parallelism as a dataflow based visualization system running on shared-memory ma-chine with multiprocessors. This system was extended to supporttask parallelism on distributed-memory architectures [17]. Wepresent a light-weighted parallelization method for large geometryvisualization by using existing facilities like MPI and VTK insteadof providing a fully featured system or extended programming li-brary.Compared to the system level solution, a lot more paralleliza-tion efforts for visualization focus on parallel rendering, rangingfrom photo-realistic rendering [21], volume rendering [25] to par-allel iso-surfacing [19]. Among a large set of previous work spe-cific to parallel polygon rendering, Crockett [5] harnessed messagepassing architectures for polygon rendering parallelism that reducesmemory usage and network contention while overlapping compu-tation and communication. He also gives an overview of parallelrendering techniques from both hardware and software prospectiveslater on [4].Other researchers have probed different indirect aspects, such asimage composition schemes [14] and data decomposition strategy[24], to improve polygon rendering performance. More recently,various parallel rendering algorithms including sort-first, sort-last and the hybrid of them, were evaluated when being used on shared-memory computers while these algorithms are originally targetingdistributed-memory architectures [18]. In our work, we explorepolygon rendering parallelization and employ image compositingas well but particularly serve depth enhancing thus provide a morelegible 3D geometry visualization by overlapping parallel depthsorting and parallel polygonal data rendering.Note that the parallel sorting problem [15, 6], which is at the coreof our parallelization framework here, would be easily solved inhighly efficient on GPU computing platforms with extensive exist-ing algorithms available [20, 23]. In this paper instead, we target acheaper solution without relying on high-end computing resourcessuch as GPUs. Alternatively and complementarily, we use CPU-based parallel sorting algorithms leveraging a single processor ofmultiple cores, which has been almost a bottom-line configurationfor modern common computers. UR M ETHODS
According to Bertin’s theory, visual legibility of two-dimensional(2D) graphical representations can be characterized by graphicaldensity, angular separation and retinal separation [10]. Further,retinal separation is defined by six visual variables including size,color, shape, value, orientation and texture. Illuminated by theBertin’s legibility rules described in terms of these dimensions, itis promising to explore the visual legibility issues in 3D data visu-alizations by examining 3D legibility dimensions. Although, ourexploration is still based on the legibility framework proposed byBertin, certain expansion is required to fully characterize legibilityin 3D graphics representations.While Bertin’s legibility dimensions serve 2D graphical repre-sentations, there is a lack of such dimensions for legible 3D datavisualizations. We expanded Bertin’s framework from 2D to 3Dby adding depth separation dimension that is characteristic of 3Ddata and examine how typical retinal variables effect legibility of3D visualizations by investigating visual encodings that map depthinformation to each one of those variables examined respectively.By such encodings, users are given visual cues to better discerndepth locations thus the overall legibility of visualization can beenhanced. Among others, currently we study three variables, size,color and value, which are inherited from Bertin’s framework di-rectly and an extended variable, transparency, which is an importantfactor influencing depth perception in 3D geometry. In our visualencodings, depth information of geometries can be either encodedby a single visual variable alone or by multiple variables combined.By comparing different encodings, it would be revealed how thosevisual variables effect the depth separation dimension hence theoverall 3D data legibility.
Our parallel visualization pipeline is outlined in Figure 1. The par-allelism is powered by MPI and visualization by VTK with paral-lelization support. Among the four processes, the master process P is responsible for data I/O, visualization interactions and coordi-nations required for a parallel rendering with consistent depth map-pings besides for rendering local data partitions as all other slaveprocesses. The collaborations between the master process and slaveprocesses involve all key steps in the pipeline from data decompo-sition to parallel depth sorting and geometry rendering. Data decomposition is ordinarily an essential part for a paralleliza-tion mechanism. Although the concrete decomposition scheme canbe very much dependent of the interrelations between data compo-nents and there are different levels of granularity of the data com-ponents as they are defined, it makes sense to split the whole dataigure 1 The overview of our parallel visualization pipeline.set into independent partitions as such that data processing of eachpartition can be performed in parallel. In the case of 3D tubes, forinstance, a single tube is regarded as the minimal component andvertices on a tube will not be assigned to different partitions.In addition, to maximally harness the computing resources avail-able, we decompose the whole geometry model in simply an aver-age manner and then evenly distribute the computational tasks forsorting, mapping and rendering to all processes. This simple datapartitioning is efficient for our tube case because for one thing thereis no data or semantic dependency among all tubes, and for an-other, task load for each process is closely equal to others even ifthe master process will be assigned certain managing roles. How-ever, general data decomposition itself is a separate topic and thereis no generally optimal solutions, which are out of the scope of thispaper.When using MPI as the underlying parallel run-time support,the above data are decomposed according to the local process id(
LocalProcId ) and total number of processes specified (
ProcNum ).Precisely, given all the data components C , C , C , ··· , C n − in theequally-partitioning scheme, local sub-range of data for process i will be C sidx , C eidx where sidx = n / ProcNum ∗ LocalProcId , eidx = n / ProcNum ∗ ( LocalProcId + ) . Specially, the last process maytake more or less data components than others if n is not exactly di-vided by ProcNum when eidx = n . Figure 2 illustrates this data de-composition scheme while showing the overall picture of the depth-stylized visualization is rendered in parallel. In application scenarios like our 3D stylized dense tube visualiza-tion, mappings from depth information of each vertex (or other unitof geometry like triangles or stripes) to visual attributes, such assize, color and transparency that are referred as retinal variables inBertin’s semiology theory, should be consistent regardless of thecurrent user viewing directions in order to serve depth perceptionhence visual legibility along the depth dimension of the visualiza-tion. In other words, all the vertices (or other geometry units) needto be ordered along the current viewing direction in order to be con-sistently mapped to visual attribute values. That being said, oncethe viewing direction changes, typically occurring when users ro-tate the data view, those vertices need to be reordered before map-pings take place over again to refresh the visualization. In our dense tube environment, vertex-wise depth ordering is re-quired for a depth-dependent tube size assignment with which abetter depth perception of even a single 3D tube can be obtainedwhen, for instance, a tube tapers or grows in its radius along thedepth direction.
According to the necessity of depth ordering explained above, real-time depth mapping relies on real-time depth sorting. For the per-vertex depth sorting, this computation is essentially the sorting ofa sequence of floating-point numbers. For depth sorting of othergeometry units, the depth sorting is often eventually reduced to aper-vertex depth sorting problem as well. For instance, if we onlywant to discern the depth locations at the level of tube rather thanthat of vertex (thus the visual variable value of vertices on a tube isalways the same), a vertex can be selected to represent the tube andthen the per-tube sorting is eventually reduced to the depth orderingof the representative vertices.Therefore, we generalize the depth sorting problem into the sort-ing of a sequence of cells. Practically, the cell can either be a singlevalue such as floating-point number or a packed data such as a structincluding multiple fields. We use the cell array for depth mappingin this paper.While there are a rich set of parallel sorting algorithms freelyavailable [2], they generally serve the solitary purpose of sortingand are usually implemented as stand-alone parallel applications.Since our ultimate goal is to parallelize dense geometry visualiza-tion in which depth mappings are integrated, we need a holisticparallel framework in which the sorting algorithm works togetherwith other steps such as depth mapping and parallel rendering insuch a way that the overall visualization performance can be max-imized. Our parallel sorting algorithm has been accommodated toparallel rendering (see section 3.5) for which data partitioning is in-volved, and optimized to mesh with efficient depth mappings (seesection 3.6.1).We adopt mixed sorting algorithm for our parallel depth sort-ing by the following key steps. Firstly, each process updates thedepth values (z-coordinates) of local vertices through a simple vec-tor arithmetic with the current camera parameters (focal point andposition, etc). Then, every single process sorts vertex depth valuesin the partition assigned using a typical quick sort algorithm andsends the sorted depth information to master process once finished.inally, the master process gathers locally sorted partitions and per-forms either a multi-way merge sort or multiple two-way mergesort. We employed the latter merge sort scheme on the master pro-cess since it is more efficient as an iterative two-way merging can beperformed once a sorted partition is received from a slave processwithout waiting all processes to finish local sorting.Algorithm 1 shows how this parallel sorting algorithm workswhile illustrating the real-time depth mappings fit for the parallelvisualization framework as a whole.
In the application scenarios like our stylized tube visualization, theprimary performance challenges come from two sources, namelydepth sorting and geometry rendering. For each updated frame, thewhole geometry model needs be rendered over again after depthsorting to reflect the depth mapping updates. Although both arecritical for a real-time update, the depth sorting time ( T s ), com-pared to the rendering time ( T r ), takes only a minor proportion ofthe whole frame update time ( T = T s + T r ).According to our sample test with a geometry of 140,000 ver-tices, T s / T is strictly less than 0 . n processes. Here a process is a generalcomputing unit that can either be a single processor on multiple-processor platform, a single core on a multi-core processor or aworker thread on a single core processor. For each rendering frame,all separate renditions with each done by a single process are aggre-gated into a single complete rendition that is only visible on the oneof the processes randomly elected as the master process. This ag-gregation is practically conducted by means of pixel-wise imagecompositing as the second step detailed as follows. when each process finished the local rendering of the partition as-signed (partial geometry), the rendition is essentially a set of pixelsin the frame buffer. As such, pixel-wise compositing is actually aprocess of compositing frame buffers. In practice, to reduce compu-tational costs, compositing only the color buffer and depth buffer issufficient for our visualization purpose. For simplicity, we describecompositing with these two types of frame buffer only.Procedurally, this compositing process is performed by the fol-lowing three steps: (1) each process fetches pixels from the framebuffers in its local process memory space, and (2) all slave (slave,as opposed to the master) processes send all the buffers one afteranother to the master process, which do not send its local buffershowever. Then, (3) the master process performs a pair-wise buffercompositing every time it receives the buffer from a slave processuntil all slave buffers are composited. Finally, the master processwrites the composited depth and color values back to correspond-ing local frame buffers as a full image. Figure 2 illustrates thispixel-wise compositing process, an example of which is shown inFigure 3 using 4 processes to parallelize the rendering task.In addition, when rendering geometries in parallel in the back-ground, the parallelization should be transparent to users. So ex-cept for special needs for showing slave renditions, no renderingpartitions should be visible and the composited visualization is dis-played on the master process only. There are two points to make for the compositing to be optimized.First, off-screen rendering is applied to avoid slave renditions.This is not only to meet the need of slave renderers for invisiblerenditions but, more importantly, to improve the overall renderingperformance. Second, creation of rendering windows on all slaveprocesses is avoided. Depending on the graphics platform practi-cally used, a less ideal solution is to hide the rendering windows ifthe creation of them is required for correct rendering. Example caseis that a window must be created to establish a context for the draw-ing to take place. Finally, synchronizing camera parameters acrossall processes before any process starts to render can simplify thelater process of image compositing. As adopted in our approach, asimple way of synchronization is to broadcast the key camera pa-rameters (focal point and position) retrieved on the master processto all slave processes. Depth mappings are applied to stylize geometry unit according toits depth information so that a better perception in the 3D environ-ment can be obtained. Depending on how the depth value is mappedto the value of different visual variables, a depth mapping is eithera linear or non-linear function f ( v ) = V ( Rank ( v d )) where Rank ( x ) is the order of x in the sorted sequence and v d is the depth value of asingle geometry unit (vertex will be consistently exemplified in thefollowing text), and function V maps the ranking order sequence tothe range of designated visual variable, [ V min , V max ] . In the case oflinear mapping, for instance, V ( x ) = V max − V min x max − x min ( x − x min ) (1)As we consider size ( s ), color ( c ), value ( i ) and transparency ( t )as the variables mapped, V ( x ) is a scalar function. Further, V ( x ) is unitary function for single mapping and multiple mappings aresimply an aggregation of multiple single mappings. For example,when mapping depth to size, color and transparency at the sametime, V : x → ( s , c , t ) is essentially V : x → ( S ( x ) , C ( x ) , T ( x )) where S , C , T are all unitary mappings. In the context of geometry rendering, depth mappings are eas-ily performed according to the simple function evaluations as de-scribed above. However, depth mappings need be parallelized aswell in order to collaborate with parallel rendering towards an op-timized performance in the context of visualization parallelization.In our parallel visualization, depth mappings are required to be co-herent in the geometry model a whole. Therefore, simply mappinglocal geometry on each process independently and then composit-ing the locally depth-stylized renditions do not work.For each process, the input of depth mapping is the ranking or-der of depth values of local geometries and, as a result, each processwill only have the local rank for every vertex in local its geometrypartition. However, the global rank of a vertex in the range of thewhole geometry must be retrieved for a coherent global depth map-ping. With global ranks of local vertices, every process can renderits local geometry independently yet correctly due to the correctmappings from the local vertices to the partition of the range of V ( Rank ( v d )) corresponding to those vertices.The following figure shows the outline of the integrated parallelsorting and depth mapping algorithm adopted in our parallel visu-alization. To figure out the global rank of depth value for each local vertex,reducing all locally sorted partitions together on the master processand then broadcast the resulting depth array that is globally sorted toall other processes can be an easy solution. As such, the global rankigure 2 Illustration of data partitioning and pixel-wise compositing in our parallel visualizationsFigure 3 An example of the pixel-wise compositing in our parallel visualization scheme. The dense streamtube visualization with depthmapped to size, color and transparency is parallelized using 4 processes. Process 0 (master) gathers all parallel renditions from slave processesand composites them together with its own local rendition to produce the complete rendering. lgorithm 1 integrated parallel depth sorting and mapping numProcs ← total of processes myId ← local process rank numPts ← total number of vertices in local partition Gather all numPts values into array allNumPts idoset ← for i = → myId − do idoset ← idoset + allNumPts [ i ] end for for i = → numPts − do depth [ i ] . vd ← depth value of the i th vertex in local geome-tries calculated from camera parameters depth [ i ] . id ← i + idoset end for sort depth according to the vd field using qsort Sum up all numPts to totalPts if myId == then oset ← tdepth [ .. numPts − ] ← depth [ .. numPts − ] for i = → numProcs − do Receive tdepth [ numPts + oset . numPts + oset + allNumPts [ i ]] from process i inplace merge tdepth [ .. numPts + oset . numPts + oset + allNumPts [ i ]] oset ← oset + allNumPts [ i ] end for for i = → totalPts − do hashIndex [ tdepth [ i ] . id ] ← i end for Broadcast hashIndex else
Send depth to master process 0
Receive hashIndex from master process 0 end if for i = → numPts − do Rank global [ i ] ← hashIndex [ i + idoset ] end for will be retrieved from the depth array received for evaluating f ( v ) for each vertex. However, the retrieval would be a O ( N ) searchfor all N local vertices, which is prohibitive enough to make an in-teractive frame update impossible alone according to our tests. Fora real-time global rank retrieval, we create a global hash index forthe whole geometry immediately after the depth sort on the masterprocess is all finished.In both the local and global depth arrays, an index is kept foreach depth value at each element and the depth array is actually asequence of vector ( d , Id ) where d is the depth value and Id is theindex, which is initialized with the original global rank of a vertexin the unsorted holistic geometry. As such, wherever a depth arrayelement is moved after sorting, its original rank, taken as a vertexidentifier as well, can be always retrieved immediately. We use thisid to associate the unsorted and sorted depth array through the hashindex. Figure 4 illustrates the hashing process for depth mapping. During the interactive exploration of the depth stylized visualiza-tion, mappings need be updated whenever the depth order of geom-etry along the viewing direction changes, typical because of datarotation. The mapping update is then reflected through refreshedrendering, which is right the reason why we explore the renderingparallelization as discussed before. It is reasonable, therefore, toactively trigger the frame update once mapping update has takenplace. There are at least two different mechanisms for the frame Figure 4 Hash index for real-time depth mapping in the context ofthe presented parallel visualization.update to be timely invoked by mapping updates. First of all, apolygonal data filter, which is used for the purpose of depth sorting,can be inserted into the demand-driven rendering pipeline so thatrendering update will be triggered when either the input or outputof the filter is modified (using VTK is in this case).However, besides updating depth mappings, geometry copy be-tween the data filters is required, which has been proven to heav-ily drag the frame rate. It is noteworthy here that the geometry isnot really updated at all when the depth mappings change. Anothermechanism is to explicitly invoke frame update (redrawing) throughinteraction handling. With this approach, only mappings are recom-puted while no geometry copying is involved. We employ the latterfor a better performance.In the interaction-driven mapping update mechanism, we onlydirectly handle user input, such as mouse interaction, that shufflesthe depth order of geometries on the master process. when respond-ing to such user input, the master process invokes frame update afterfinishing mapping calculations and then sends a remote method in-vocation (RMI) message to all slave processes. In the RMI handleron each process, mapping update is firstly triggered, followed by anactive call to frame update. Apparently, there is a message process-ing loop on all processes to enable real-time RMI responding.
MPLEMENTATIONS
Our parallel depth stylized visualization is implemented in C/C++using VTK with parallelism support by MPI. In the parallel sortingalgorithm, qsort routine from the standard C library for local quicksort on each process and generic in-place merge algorithm in C++STL library for iterative two-way merge sort on the master process.We have employed the image compositing functionalities providedby VTK’s parallel modules but extended certain classes to tailortheir functions for our customized pipeline components in order toimplement the pixel-wise compositing. While off-screen render-ing has been directly supported in VTK, we make use of wrappingwindows by Qt widgets to hide rendering windows of all slave pro-cesses.In addition, our depth sorting filter is extended from VTK’spolygonal data depth sort filter and an interactor component ex-tended from VTK’s track-ball camera interactor, which work to-gether to meet our needs for the interaction-driven mapping up-date. To explicitly trigger frame update, user-defined RMI mes-sages for this purpose are added and the callbacks are registered toVTK’s multiple process controller component before parallel ren-dering starts. With these extended components, the interactor re-sponds to data rotation by broadcasting mapping update RMI mes-sage to all slave processes and then mapping calculations and frameupdate are invoked in the callback of the RMI message. The visu-alization program is simply running as a MPI application and thusthe number of processes can be indicated when launching the MPIruntime. As we discuss in detail in section 5, an optimal numberof processes to be indicated depends on the actual hardware archi-ecture.Figure 5 shows the outlook of our test application of the pre-sented parallel visualization method. The GUI framework is cre-ated using Qt 4.0 by which all the interaction widgets are set up forthe depth stylizing customization. In order to achieve an optimizedperformance, parallel processing is only applied to the renderingwidget and all other GUIs are created on the master process only.As such, GUI interactions have to be explicitly relayed from themaster process where they are triggered to all slave processes so thatthe slave rendering can reflect the changes in the stylizing config-uration as the result of those interactions. We register another typeof RMI message and define a dedicated callback to realize the RMIfor updating slave renderings. RMI messages are easily transmittedby MPI communications.Figure 5 The outlook of the depth-stylized 3D tube visualizationparallelized using the proposed method.
ESULTS
We have applied the parallel visualization pipeline presented in thispaper to interactive depth-stylized visualization for the purpose ofinvestigating legibility issues in 3D data visualizations. As our cur-rent application scenarios, we create streamtube visualizations ofdiffusion tensor MRI (DTI) data with single and multiple depthmappings applied in order to enhance users’ depth perception inthe 3D visualizations, as shown in Figure 6.Also, on the basis of above implementations, we evaluate theefficiency of our parallel visualization approach by firstly measur-ing the overall rendering performance including depth sorting andMPI communication costs and then comparing our method to otheralternative parallel rendering implementations. Our evaluation isbased on the results collected from many runs of our test applica-tion shown before on a Intel(R) Core(TM)2 Quad 2.66GHz proces-sor with 4GB DDR2 memory.
We measure the proposed parallel visualization method by firstcomparing visualization performance of the parallel approach to thesequential one with different scales of geometry. Precisely, for eachone of the test data sets, the time spent by rendering a single framein the parallel visualization in milliseconds is paired for compari-son with that spent by the same task in sequential one. Here in ourapplication scenarios, we visualize 3D depth stylized streamtubesgenerated from diffusion tensor MRI data with different depth map-ping schemes applied for the tests.As shown in Figure 7, parallelization enables an interactive ren-dering performance for our depth-stylized geometry visualization, which is hard to obtain with sequential approach. Each value ofthe rendering time measurements is an average of the total render-ing cost over 100 continuous frames. For the parallel rendering,time measured has included costs of communications among the 4processes used.Figure 7 Rendering performance of depth-stylized tube visualiza-tion using our parallelizing method compared with sequential vi-sualization performance. Both single depth mapping and multiplemappings are tested and compared between the parallel and sequen-tial visualization.We differentiate only two instances of depth mappings here,depth to color alone and depth to both size and color, becausethey are representative of two disparate amount of computationsfor depth mappings in our tests. For the single mapping from depthto color, there is only one round of depth sorting besides the ren-dering task involved. For the multiple mappings from depth to sizeand color, there are two rounds of depth sorting plus the tube meshgeneration besides rendering task. Among the two rounds of depthsorting, one is for depth of line geometry to size mapping beforetubes of different radii are generated and the another for depth oftube geometry to color mapping after tubes are produced from poly-lines.To examine how the number of processes used in the paralleliza-tion effects the parallel visualization performance, different valuesof the number has been tested. As the result in Table 1 presents, per-formance increases monotonically along the increase in the numberof processes before the number reaches 4, after which the perfor-mance decreases also monotonically when the number continues togrow. That the maximal speedup is achieved at the number of pro-cesses of 4 can be attributed to the fact that the number of hard CPUcores is 4. we further verify the efficiency of our parallelization approach fordepth-stylized geometry rendering by comparing the the overall vi-sualization performance gained by our method with that by otheralternative approaches. We implemented the 3D tube visualizationwith depth-stylizing using both a partially and a fully parallelizedrendering. For both comparisons, we gauge the total rendering timewith five different scales of 3D tube geometries stylized by depth-dependent color and color, similar to the methodology for measur-ing performance gain of parallel over sequential visualizations de-scribed in section 5.1. Constantly, 4 processes are used in all thefollowing tests.igure 6 Our parallel visualization of DTI streamtubes with single mapping including depth to size (upper left), color (upper right), value(middle left) and transparency (middle right) respectively, and multiple mappings including depth to size and color combined (bottom left)and to value and transparency combined (bottom right). We use these different mappings with typical visual variables to communicate depthinformation in the 3D visualizations.etrics Number of Processes2 3 4 5 8 12Time (ms) 409 359 347 401 469 642Speedup 1.72 1.95 2.02 1.75 1.5 1.09Efficiency 0.86 0.65 0.51 0.35 0.19 0.09Table 1 The effect of the number of processes employed on theparallel performance gauged by time cost in milliseconds, parallelspeedup and efficiency, tested using our parallel approach with thevisualization with depth to color mapping of 9,635 tubes including1,447,005 vertices.By comparing to the partially parallel rendering, in which onlythe depth sorting is parallelized while the overall rendering pipelineis sequential, we intend to show the advantages of our approachwith respect to meshing the sorting parallelization with the render-ing parallelization. We employed the Kernel for Adaptive, Asyn-chronous Parallel and Interactive programming (KXAAPI) library[8] to sort the depth information of the whole geometry on the se-quential visualization pipeline of VTK. As the results, Figure 8shows our approach appears much superior and outperforms bymore than two folds.Figure 8 Rendering performance of depth-stylized tube visualiza-tion using our fully parallelized pipeline compared with that of thesame visualization using KXAAPI for depth sorting in the partiallyparallel visualization. Performance of sequential rendering is alsoincluded for comparisons.For an alternative fully parallelized visualization solution tocompare, we implemented our depth-stylized tube visualization us-ing the IceT module in Paraview [13] with the same measuringmethod as above. We have partially ported the IceT module fromParaview source package into VTK for the purpose of our test. Asis shown in Figure 9, our parallel visualization solution definitelyoutperforms the IceT based parallelization scheme.
ISCUSSION
In our application scenario, we visualize depth-stylized 3D tubes,which is generated in the run-time by wrapping polylines that isloaded into our parallel visualization pipeline. Alternatively, tubemeshes can be produced prior to its being loaded to the pipelinethen we the computational cost for tube generation will be elimi- Figure 9 Rendering performance of depth-stylized tube visualiza-tion using our parallel approach compared with that of the samevisualization using Paraview IceT, both being fully parallelizedpipeline, including sequential performance or comparisons.nated. There are two reasons for not doing the off-line tube gener-ation.First, we need to change the radii of tubes to reflect the depthchanges for the depth to size mapping. Loading line geometriesand then generating tubes in the run-time is a more efficient wayfor visualizing tubes with depth to tube size mapping than loadingtube meshes directly and then transforming each tube to implementthe mapping.Second, as mentioned in section 5.1, by means of online tubegeneration, we differentiate two types of mappings, single and mul-tiple, in terms of computational costs in order to demonstrate thata parallel visualization approach like ours will achieve a even su-perior overall visualization performance if they are more computa-tional steps, such as geometry processing like wrapping poly-linesto produce tube meshes, involved within the whole geometry ren-dering job.As a matter of fact, it is obviously shown in the performancemeasurement results that visualization accelerations are muchgreater with multiple mappings from depth to size and color thanwith single mapping from depth to color when the geometry scaleis increasing. This is due to that more computations are to be par-allelized as well and thus the overall performance gain by paral-lelization increases compared to sequential visualization. Accord-ing to this analysis, it is reasonable to scale our parallel visualiza-tion method to a more complex visualization context where morecompute-intensive steps associated with the rendering task must beinvolved. Results from our tests before have initially show this typeof scalability of our proposed approach.In addition, although we currently use only a single four-coreprocessor to test our parallelization scheme, it is reasonable to pre-dict that the performance speedup shown in Table 1 will continue togrow if the number of CPU cores further increases. Also, becauseof the performance scalability of the underlying MPI facilities toprocessor architecture, application of our method to a multiple pro-cessor machine can gain even greater visualization accelerations.
ONCLUSIONS
We presented a parallel visualization method that enables real-timefloating-point computations involved in depth mappings for morelegible 3D data visualizations via enhanced depth perception, andtherefore helps achieve interactive frame rate in the depth-stylizedisualization of large 3D geometries. The method presented isbuilt upon the MPI paradigm within VTK with necessary extensionadopted for vertex depth reordering optimizations. Our approachhas been tested with 3D dense tubes containing millions of verticeswith multiple mappings of depth information applied and the inter-active frame rate achieved has shown that our method is efficient foraddressing performance issues inherent in the visualization scenar-ios exemplified in the depth-stylized visualizations. Nevertheless,our method can be easily extended to parallelize visualizations ofother large-scale geometry data where intensive computations arerequired in order to obtain interactive rendering speed.We have demonstrated the superior efficiency of our approachas a CPU-based parallel visualization framework by comparing thereal-time rendering performance of the method presented with thatof both sequential method and other parallelization approaches suchas XKAAPI and Paraview Icet. As the results show, the proposedframework can provide an efficient alternative to parallel visualiza-tion solutions relying on high-end hardware such as GPUs for in-teractively visualizing large-scale 3D geometry models like stylizeddense tubes when the high-end hardware is not readily available. R EFERENCES [1] J. Ahrens, C. Law, W. Schroeder, K. Martin, and M. Papka. A paral-lel approach for efficiently visualizing extremely large, time-varyingdatasets.
Los Alamos National Laboratory, Los Alamos, New Mexico,Technical Report LAUR-00-1620 , 2000.[2] G. E. Blelloch. A sampling of parallel sorting algo-rithms. .[3] S. Bruckner and E. Gr¨oller. Enhancing depth-perception with flexiblevolumetric halos.
IEEE Transactions on Visualization and ComputerGraphics , pages 1344–1351, 2007.[4] T. Crockett. An introduction to parallel rendering.
Parallel Comput-ing , 23(7):819–843, 1997.[5] T. Crockett and T. Orloff. Parallel polygon rendering for message-passing architectures.
Parallel & Distributed Technology: Systems &Applications, IEEE , 2(2):17–28, 1994.[6] E. K. Donald. The art of computer programming.
Sorting and search-ing , 3:426–458, 1999.[7] N. Elmqvist and P. Tsigas. A taxonomy of 3d occlusion managementfor visualization.
IEEE Transactions on Visualization and ComputerGraphics , pages 1095–1109, 2008.[8] T. Gautier. Kernel for adaptative, asynchronous parallel and in-teractive programming. http://kaapi.gforge.inria.fr/dokuwiki/doku.php?id=start .[9] P. Irani and C. Iturriaga. Labeling nodes in 3d diagrams: Using trans-parency for text legibility and node visibility. Technical report, Uni-versity of New Brunswick, 2002.[10] B. Jacques.
Semiology of graphics: diagrams, networks, maps . Uni-versity of Wisconsin Press, 1983.[11] J.Diaz and P. vazquez. Depth-enhanced maximum intensity projec-tion.
International Symposium on Volume Graphics , 8:1–8, 2010.[12] C. Johnson and S. Parker. The scirun parallel scientific computingproblem solving environment. In
Ninth SIAM Conference on ParallelProcessing for Scientific Computing , 1999.[13] Kitware. Vtk/multipass rendering with icet. .[14] T. Lee, C. Raghavendra, and J. Nicholas. Image composition schemesfor sort-last polygon rendering on 2d mesh multicomputers.
IEEETransactions on Visualization and Computer Graphics , 2(3):202–217,1996.[15] C. E. Leiserson, R. L. Rivest, C. Stein, and T. H. Cormen.
Introductionto algorithms . The MIT press, 2001.[16] Y. Li, C. Fu, and A. Hanson. Scalable wim: Effective exploration inlarge-scale astrophysical environments.
IEEE Transactions on Visual-ization and Computer Graphics , pages 1005–1012, 2006.[17] M. Miller, C. Hansen, and C. Johnson. Simulation steering with scirunin a distributed environment.
Applied Parallel Computing Large ScaleScientific and Industrial Problems , pages 366–376, 1998. [18] B. Nouanesengsy, J. P. Ahrens, J. Woodring, and H.-W. Shen. Revis-iting parallel rendering for shared memory machines. In
EGPGV’11 ,pages 31–40, 2011.[19] S. Parker, P. Shirley, Y. Livnat, C. Hansen, and P. Sloan. Interactiveray tracing for isosurface rendering. In
Visualization’98. Proceedings ,pages 233–238. IEEE, 1998.[20] M. Pharr and R. Fernando.
Gpu gems 2: programming techniquesfor high-performance graphics and general-purpose computation .Addison-Wesley Professional, 2005.[21] E. Reinhard, A. Chalmers, and F. Jansen. Overview of parallel photo-realistic graphics.
Eurographics 98 State of the Art Reports , pages1–25, 1998.[22] F. Ritter, C. Hansen, V. Dicken, O. Konrad, B. Preim, and H. Peit-gen. Real-time illustration of vascular structures.
Visualization andComputer Graphics, IEEE Transactions on , 12(5):877–884, 2006.[23] E. Sintorn and U. Assarsson. Fast parallel gpu-sorting using a hy-brid algorithm.
Journal of Parallel and Distributed Computing ,68(10):1381–1388, 2008.[24] S. Whitman. Dynamic load balancing for parallel polygon rendering.