[PDF] The Mixture Graph-A Data Structure for Compressing, Rendering, and Querying Segmentation Histograms

Abstract

In this paper, we present a novel data structure, called the Mixture Graph. This data structure allows us to compress, render, and query segmentation histograms. Such histograms arise when building a mipmap of a volume containing segmentation IDs. Each voxel in the histogram mipmap contains a convex combination (mixture) of segmentation IDs. Each mixture represents the distribution of IDs in the respective voxel's children. Our method factorizes these mixtures into a series of linear interpolations between exactly two segmentation IDs. The result is represented as a directed acyclic graph (DAG) whose nodes are topologically ordered. Pruning replicate nodes in the tree followed by compression allows us to store the resulting data structure efficiently. During rendering, transfer functions are propagated from sources (leafs) through the DAG to allow for efficient, pre-filtered rendering at interactive frame rates. Assembly of histogram contributions across the footprint of a given volume allows us to efficiently query partial histograms, achieving up to 178 × speed-up over na ï ve parallelized range queries. Additionally, we apply the Mixture Graph to compute correctly pre-filtered volume lighting and to interactively explore segments based on shape, geometry, and orientation using multi-dimensional transfer functions.

Full PDF

TTo appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020).

The Mixture Graph—A Data Structure for Compressing, Rendering,and Querying Segmentation Histograms

Khaled Al-Thelaya Marco Agus Jens Schneider (

IEEE Member ) Fig. 1: The

Mixture Graph allows us to interactively render and query segmented volumes while maintaining pre-ﬁltered anti-aliasing across multiple scales. Segmented volumes store nominal data and therefore pose challenges if traditional rendering isattempted.

Left to right:

Hippocampus (1178 × × × × × × Abstract —In this paper, we present a novel data structure, called the

Mixture Graph . This data structure allows us to compress,render, and query segmentation histograms. Such histograms arise when building a mipmap of a volume containing segmentationIDs. Each voxel in the histogram mipmap contains a convex combination ( mixture ) of segmentation IDs. Each mixture represents thedistribution of IDs in the respective voxel’s children. Our method factorizes these mixtures into a series of linear interpolations betweenexactly two segmentation IDs. The result is represented as a directed acyclic graph (DAG) whose nodes are topologically ordered.Pruning replicate nodes in the tree followed by compression allows us to store the resulting data structure efﬁciently. During rendering,transfer functions are propagated from sources (leafs) through the DAG to allow for efﬁcient, pre-ﬁltered rendering at interactive framerates. Assembly of histogram contributions across the footprint of a given volume allows us to efﬁciently query partial histograms,achieving up to × speed-up over na¨ıve parallelized range queries. Additionally, we apply the Mixture Graph to compute correctlypre-ﬁltered volume lighting and to interactively explore segments based on shape, geometry, and orientation using multi-dimensionaltransfer functions. Index Terms —Segmented Volumes, Data Structures, Sparse Data

NTRODUCTION

Segmentations of volume data have traditionally been used in a med-ical context to separate different semantic objects in the data. Typ-ically, one or more segment IDs are computed per voxel that repre-sent the probability of the voxel belonging to the speciﬁc segment.Such segmentations have then been used either to modulate the trans-fer function or to extract 3D representations for each semantic object.Since then, segmented volumes have also become popular in other dis-ciplines, such as engineering, e.g., in the form of topological segmen-tation of ﬂow ﬁelds. In all these examples, however, segmentationshave been primarily treated as annotations to the original data.This is changing due to the extensive use of segmentations in ﬁeldssuch as neuroscience, connectomics (see also Fig. 1, left & middle),computational neurology and material science (see also Fig. 1, right).Here, raw data is typically imaged using an electron microscope (EM).This data, however, primarily serves as a means to compute the seg-mentation. Once the raw volume has been successfully partitionedinto semantic objects, the raw data is used less and less. Reasons forthis procedure include that the raw EM data is noisy and often lacks adirect optical interpretation. Some of the data in the OpenConnectomeProject [32], for instance, is imaged using a high-resolution scanning(non-transmissive) EM [24]. Such data does represent neither opacitynor color. Therefore, this traditional input for volume rendering can be • All authors are with the College of Science and Engineering (CSE),Hamad Bin Khalifa University (HBKU), Education City, Doha, Qatar. • Jens Schneider is the corresponding author and can be reached [email protected]. assigned using transfer functions [27] prior to rendering. Alternatively,3D surface representations of semantic objects may be used. Still, vol-ume rendering of the raw data [23] is highly regarded as a valuabletool in order to obtain, author, and annotate [4, 5], or validate [1] theresulting segmentation. However, as segmentation volumes are nowa primary modality rather than a derived quantity, interactive, directrendering of segmentations gains importance [6].Such segmentation volumes store nominal data, posing challengesto traditional rendering attempts such as: data cannot be interpolatedor pre-ﬁltered before assigning optical properties to voxels. This im-plies that, traditionally, mipmaps [56] need to be recomputed fromscratch whenever a change in the transfer function alters these voxelproperties. Adding to the challenges, volumes generated in connec-tomics are among the largest volumetric data sets currently avail-able, making pre-ﬁltered anti-aliasing highly desireable. Moreover,the nominal character of the data also makes reducing data size whilemaintaining direct renderability difﬁcult, ruling out virtually all exist-ing lossy compression methods.

Contributions.

In this paper, we address some of the most dire of theaforementioned needs. In particular, we present the Mixture Graph,a novel, graph-based data structure to represent hierarchical segmen-tation histograms. We show that these histograms can be understoodas computational instruction for building a mipmap to support pre-ﬁltered anti-aliasing before the optical properties at voxels are known.Compared to na¨ıve use of such histograms, the Mixture Graph presentsan efﬁcient and compact alternative and allows us to propagate trans-fer function updates efﬁciently and in parallel through the graph. Thisis achieved by a symbolic factorization that breaks the aforemen-tioned computational instruction into a sequence of linear interpola-1 a r X i v : . [ c s . G R ] S e p ions. These interpolations can be compressed using well-established,lossy compression techniques. This leads to a prune-and-recycle oper-ation on the underlying graph that exploits sparsity in the histogram’srange. The resulting data structure naturally supports pre-ﬁltered ren-dering. We also describe how the Mixture Graph can be used to storequantized normals to provide simple pre-ﬁltered shading. Finally, wepresent a footprint assembly algorithm to efﬁciently compute partialsegment histograms across a given sub-volume. ELATED W ORK

In connectomics and material sciences, segmentation volumes haveceased to be mere annotations to the semantic objects in the data. In-stead, the segmentation has become a ﬁrst-class modality to be ana-lyzed and visualized. Reasons include that connectomics data is typi-cally imaged with an electron microscope, an imaging modality proneto noise which does not necessarily result in data lending itself to inter-pretation in terms of optical absorption and opacity. For instance, un-like transmission EMs, the more commonly used scanning EMs mea-sure the backscattering of the electron beam [42]. Still, direct volumerendering is used [29, 23], often in combination with other techniques,to proofread [17, 1] and analyze or explore [2, 5, 4, 21] automati-cally [25, 50, 39] or semi-automatically [22] generated segmentations.As more and more automated segmentation algorithms becomeavailable, large and densely segmented data sets emerge. Unlike thequantized real values stored by their EM input, these data sets storenominal integer IDs. Rendering such modalities was addressed, e.g.,by two-level volume rendering [18, 16], which combines multiple vol-ume rendering techniques to highlight the various semantic objects incombination with the original input data. Other approaches seek toreconstruct a smooth surface for purposes of rendering [26].While all this demonstrates the interest in segmented data, compres-sion of segmented volume data gained relatively little attention. Thereason is that, albeit, being the most popular and successful choice inthe last decades, lossy compression methods [20, 33, 46] cannot bereadily used to compress nominal integer data. On the other hand,lossless compression methods face challenges with respect to imple-mentation and performance on parallel architectures [35]. This isdue to their inherently sequential view of the data, and, while suchmethods exist [13, 54, 53, 47], they generally do offer neither therandom access nor the bandwidth required to interactively render di-rectly from the compressed representation. Aside from raw integervolumes [32], PNG-compressed RGB or RGB α slices storing the seg-ment IDs in multiple 8-bit channels are still a de-facto format for ex-changing columns of neuronal tissue [24, 7]. PNG’s compression ratiofor this type of data is generally signiﬁcant (e.g., < . scalar volume data focus onreducing the size of opacity values. Wavelets [10, 8], while resultingin good compression rates [31, 20, 33, 3] are usually not well suitedfor decoding on the GPU. The reason is that they derive their efﬁ-ciency not only from the actual wavelet transform, but from the codingback-end of the transform coefﬁcients. Arithmetic codes [38] or varia-tions of embedded zero tree codes (e.g., SPIHT [41]) are traditionallyused. These do not trivially support random parallel access. GPU-based wavelets are employed in the ﬁeld of terrain rendering [49] andoctrees [37] are used to compress volume data. All these methods,however, do not support lossless encoding of nominal data such as thesegmentation volumes we are concerned with. While lossless codingmay also be driven by a wavelet transform, such as the fully invertibleLeGall integer basis [8]. To the best of our knowledge, however, suchlossless transforms have not yet been applied to volume data. Vector quantization [14] has been applied successfully to the lossycompression of volume data [34, 46, 12]. Vector quantization is simi-lar to our method in that a palette or codebook is learned from the data.Each entry in the codebook stores a vector, whereas each vector in theinput data is replaced by an index into the codebook. In this work, weutilize vector quantization to derive a nominal volume from a normalmap to apply the Mixture Graph for pre-ﬁltered shading. The relatedﬁeld of sparse coding and sparse dictionary learning [40] can be seenas a generalization of vector quantization: instead of referencing thecodebook with a single index, a sparse weight vector is stored. Decod-ing consists of computing a linear combination of codebook entries.More recently, Wang et al. [51, 52] propose to apply sparse 3D Gaus-sian Mixture Models to handle massive scalar simulation volumes.In this work, we consider the efﬁcient, hierarchical storage of seg-mented volume data. Our method considers a mipmap of attributes,such as segment color, that would arise under application of a trans-fer function to the input segmentation. However, unlike a traditionalmipmap that is built after a transfer function is applied, our data struc-ture stores the computations necessary to arrive at a mipmap. Thesecomputations are factorized into simple linear interpolations that canbe compressed efﬁciently and by lossy methods. Our data represen-tation is essentially a paletted texture, in which the palette can be up-dated efﬁciently and in parallel. Unlike traditional paletted textures,however, our palette can store mixtures of multiple colors, somewhatsimilar to two-colored pixels [36]. Our method is most closely relatedto double sparsity and sparse coding [40], although the methodologydiffers substantially: We perform a greedy factorization of the compu-tations necessary to compute a palette, whereas sparse coding is usu-ally formulated as an (orthogonal) matching pursuit [30] optimizationproblem to represent data in a potentially overcomplete basis. LGORITHMIC O VERVIEW

Our method ﬁrst computes a normalized histogram mipmap (Sec. 3.1),in which each voxel stores a “mixture” of segment IDs. Mixtures areconvex combinations of segment IDs reﬂecting the relative number ofoccurrences of each ID within each voxel of the mipmap. Clearly, sucha histogram may have signiﬁcant storage requirements and typicallyresults in heavily unbalanced workloads during rendering.In a second step, we factorize the histogram mipmap into a set oflinear interpolations (Sec. 3.2). Made possible by embedding mixturesin R ∞ , this step results in a directed, acyclic graph (DAG) representingboth the histogram and the computations necessary to reconstruct orrender the original segmented volume at different scales (Sec. 3.4).We use a scalar quantization step to prune redundant nodes in theDAG. The result is then compressed at a ﬁxed bitrate to facilitate fast,random access to the original histogram (Sec. 3.3).Since the quantization step is lossy, we also store quantizationerrors for each quantization bin to estimate the reconstruction error inlater stages (Sec. 3.5), such as running fast, approximate queries ofpartial histograms across any given sub-volume. This feature allowsdomain scientists to quickly count IDs in a volumetric range andassess their distribution (Sec. 3.6). Notation.

In this paper, we make heavy use of convex combinationsthat we call “mixtures”. A mixture is described by a vector m ∈ R ∞ with the following properties. (cid:107) m (cid:107) = [ m ] n ≥ ∀ n (2) (cid:107) m (cid:107) < ∞ , (3)where we used the (cid:96) pseudo-norm to count non-zero elements in m and used [ · ] n to denote the n th element in a vector. The scalar product (cid:104) m , x (cid:105) between a mixture vector m and a vector x ∈ R ∞ storing datathus computes a convex combination (Eq. (1,2)) of a ﬁnite number ofelements (Eq. (3)) in x . We further deﬁne the set of all mixtures M : = { m ∈ R ∞ : (cid:107) m (cid:107) = ∧ (cid:107) m (cid:107) < ∞ ∧ [ m ] n ≥ ∀ n } . o appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020). Finally, we generally assume that the greatest position of a non-zeroelement in m is known and ﬁnite, denoted ∃ (cid:98) k ( m ) ∈ N : [ m ] k = ∀ k > (cid:98) k ( m ) . Given a compact, discrete domain, D ⊆ N , and a volume of seg-ment IDs, S : D → N , we construct a hierarchical segmentation his-togram with l max + H : D ×{ ,..., l max } → M as follows. Let i , j , k ∈ D denote a voxel position in 3D and l ∈ { ,..., l max } denotea hierarchy level. Let l = (cid:98) e i denote the i th unit vector over R ∞ . We then compute H ( i , j , k , ) = (cid:98) e S ( i , j , k ) ∀ i , j , k ∈ DH ( i , j , k , l ) = N i + ∑ u = i j + ∑ v = j k + ∑ w = k H ( u , v , w , l − ) , (4)where N is the number of voxels in the support of H ( i , j , k , l ) (typi-cally 8, but potentially less than 8 at the borders). If the input datahas non-power-of-two resolution, we round up the resolution of eachsubsequent level, adjusting summation limits and N in Eq. (4) accord-ingly. We continue computing additional levels in H in this way untilwe reach l = l max with a resolution of 1 voxels.Our assumption that the input volume S contains exactly one seg-ment ID per voxel merely served the exposition of this section. If vox-els in S already contain mixtures (i.e., S : D → M ), we set H ( · , ) = S and proceed as described above. In this paper, we will only discussthe traditional average-of-eight mipmap ﬁlter [56], but other low-passﬁlters can be used in the construction of H , as long as the low-passﬁlter can be normalized to a mixture itself. High-resolution levels of H are very sparse for most real-world seg-mented volumes. This is particularly true for volumes in which onlyone segment ID is provided per voxel. In contrast, H ( · , l max ) is thedense, normalized histogram of all segment IDs in the volume. Thisposes challenges for processing and rendering such histograms, sincethe workload per voxel is highly inhomogeneous: In order to render asegmented volume with 1,024 IDs using an RGB α transfer function,only one fetch is sufﬁcient for each voxel in l =

0, whereas higher lev-els require up to 1,024 fetches to compute the color of a single voxel.To balance this workload, we propose to factorize each mixture into aset of “simpler” mixtures. In this paper, we consider mixtures of theform λ ∈ M : (cid:107) λ (cid:107) ≤

2, that is, we restrict the factorization to eitherlinear interpolation between two elements or identity of one element.Such a factorization is always possible in the R ∞ embedding. Westart by populating a mixture list Λ with N trivial mixtures, one foreach input segment: Λ = [ (cid:98) e ,..., (cid:98) e N ] . We then examine one of theremaining mixtures m “over Λ ” (that is, [ m ] n (cid:54) = ⇔ n ∈ { ,..., N } )with (cid:107) m (cid:107) >

2. We pick two non-zero positions i , j (i.e., [ m ] i (cid:54) = [ m ] j (cid:54) = λ N + : = [ m ] i (cid:98) e i + [ m ] j (cid:98) e j [ m ] i + [ m ] j , (5)which is appended to the mixture list Λ ← (cid:91) Λ ⊕ λ N + . (6)Finally, we update m , m ← (cid:91) m − [ m ] i (cid:98) e i − [ m ] j (cid:98) e j + (cid:16) [ m ] i + [ m ] j (cid:17)(cid:98) e N + . (7)This update removes one non-zero entry from m in total, and creates alinear interpolation λ N + . Here, we used the embedding of mixtures in R ∞ to add mixtures that, informally speaking, use positions “behind”those dimensions of R ∞ that previously carried information. We there-fore use the embedding in R ∞ to the same effect described in Hilbert’sGrand Hotel thought experiment [11], in which a fully booked hotel Input: m = ( . , . , . , . ,... ) Mixture List: Λ = [ (cid:98) e , (cid:98) e , (cid:98) e , (cid:98) e ] Pick , : m = ( . , . , . , . ,... ) Insert new λ : Λ = [ (cid:98) e , (cid:98) e , (cid:98) e , (cid:98) e , λ ] , λ = ( , ,... ) Update: m = ( , , . , . , . ,... ) Pick , : m = ( , , . , . , . ,... ) Insert new λ : Λ = [ (cid:98) e , (cid:98) e , (cid:98) e , (cid:98) e , λ , λ ] , λ = ( , , , , ,... ) Update: m = ( , , , . , , . ,... ) Pick , : m = ( , , , . , , . ,... ) Insert new λ : Λ = [ (cid:98) e , (cid:98) e , (cid:98) e , (cid:98) e , λ , λ , λ ] , λ = ( , , , , , ,... ) Update: m = ( , , , , , , ,... ) Result: m = ( , , , , , , ... ) Λ = [ (cid:98) e , (cid:98) e , (cid:98) e , (cid:98) e , λ , λ , λ ] . Fig. 2: Factorization of a mixture m , (cid:107) m (cid:107) =

4. In each step, two non-zero positions in m are selected, a new linear interpolation λ is addedto the mixture list Λ , and m is updated ( ... denote trailing zeros).with inﬁnitely many rooms can always accommodate a countable, po-tentially inﬁnite number of new guests. Unlike the original thoughtexperiment, in which guests move rooms to free up the room with thesmallest index, we use the embedding to append mixtures with largerand larger (cid:98) k ( m ) to a countable set of mixtures with ﬁnite norms which,in the limit, may span R ∞ .When repeated until (cid:107) m (cid:107) =

1, we obtain a series of linear interpo-lations λ i with maximum non-zero positions (cid:98) k ( λ i ) < i , (8)which is a direct consequence of “appending” linear interpolations topreviously used dimensions in R ∞ . Figure 2 shows an example for thefactorization of mixtures.As depicted in Fig. 3, such a series of linear interpolations can berepresented as a binary tree with edge weights. Leaf nodes representthe input volume’s segments and the root represents the ﬁnal mixture m . Each internal node represents a linear interpolation λ with exactlytwo children corresponding to the two non-zero entries i , j in λ . Edgeweights are given by [ λ ] i and [ λ ] j .Carrying out the factorization for a set of mixtures while re-using identical nodes results in a directed acyclic graph (DAG) G =( N , E , W ) as depicted in Figure 4. The node set N contains nodes withan in-degree of 0 (sources) representing the original segment IDs, in-ternal nodes with an in-degree of 2, and sinks with an out-degree of (cid:98) e (cid:98) e (cid:98) e (cid:98) e λ λ m ≡ λ

13 23 12 2512 35

Fig. 3: Factorization tree of the example in Fig. 2. Leaf nodes (boxes)store unit vectors representing the input segment IDs, internal nodes(circles) perform linear interpolations between two children, and theroot (rounded rectangle) corresponds to the original mixture m .3. The ﬁnal mixture for each voxel is represented by either sources orsinks. We deﬁne the edge direction of ( i , j ) ∈ E as “from i to j ”, withthe notion of mixture i “contributes to” mixture j : ∃ ( i , j ) ∈ E : w i j (cid:54) = ∈ W ⇔ ∃ λ j ∈ Λ , i ∈ N : (cid:2) λ j (cid:3) i = w i j . (9)Since each node except for sources has an in-degree of exactly two, westore the connectivity information for each node as incoming edges.For each edge ( i , j ) , we call node i a predecessor of j and, conversely, j a successor of i . From the previous section, it is intrinsically clear that the factorizationinto linear interpolations is not unique: in each step any two non-zeroelements of m can be picked. Our compression method exploits thisdegree of freedom to generate as many redundant nodes as possible.For instance, Fig. 4 shows that λ can be re-used in the factorizationof both m and m (cid:48) , since they both mix (cid:98) e and (cid:98) e in the same ratio1 : 2, albeit with different weights of 0 . . N choose 2 possible picks i , j , where N is thenumber of segments in the input. Picking i , j removes this combinationfrom subsequent picks. However, we add a new option to pick from( N − P choices, with P = N ∏ i = (cid:18) i (cid:19) = N ∏ i = i ( i − ) = N ( N ! ) . (10)The problem is further exacerbated by the sheer number of mixturesto be considered and the fact that only nodes with the same ratio be-tween components i and j can be re-used. Our method therefore relieson a greedy algorithm to ﬁnd a candidate pick i , j that has a good re-use probability. We consider two greedy strategies. The ﬁrst strategy,which we call max-occurrence , picks the pair of indices i , j that mostfrequently corresponds to non-zero components of mixtures in H . Therationale behind this strategy is that, even though we are ultimately in-terested in recycling nodes representing mixtures λ = ( − w ) (cid:98) e i + w (cid:98) e j ,frequent combinations of non-zero indices i , j are good candidates.The reason is that w ∈ ( , ) can be quantized to increase re-use. For-mally, we deﬁne a counting function φ , C : = { i , j ∈ { ,..., N } : i (cid:54) = j } φ : C → N φ ( i , j ) : = (cid:12)(cid:12)(cid:12)(cid:110) h ∈ H : [ h ] i (cid:54) = ∧ [ h ] j (cid:54) = (cid:111)(cid:12)(cid:12)(cid:12) , (11)and compute i , j = argmax k , l ∈ C φ ( k , l ) . (12)We call our second strategy max-reduction . It exploits that afterfactoring out a mixture λ in m with (cid:107) λ (cid:107) = (cid:107) m (cid:107) = N , a mixture m (cid:48) with (cid:107) m (cid:48) (cid:107) = N − i , j reduces our choicesfrom N choose 2 to N − N − N > N =

2. Since this decrease of choices is directly equivalent toshrinking the search space, our strategy is to ﬁnd a pair i , j minimizingthe size of the remaining search space. Formally, we deﬁne a secondcounting function φ , φ : C → N φ ( i , j ) : = ∑ h ∈ H  [ h ] i = ∨ [ h ] j = (cid:107) h (cid:107) − (cid:107) h (cid:107) >

22 else if (cid:107) h (cid:107) = , (13)and compute (cid:98) e (cid:98) e (cid:98) e (cid:98) e λ λ m ≡ λ λ m (cid:48) ≡ λ

13 23 12 2512 35 14 342535

Fig. 4: Example directed acyclic graph (DAG) correspond-ing to the factorization of m = ( . , . , . , . ,... ) and m (cid:48) =( . , . , . , . ,... ) . Mixture λ can be re-used, since the ratio be-tween the ﬁrst two components is the same (0 . . ≡ . . i , j = argmax k , l ∈ C φ ( k , l ) . (14)These two strategies are compared in the Results section. Once wehave found the best pair i , j according to one of the strategies, we pro-ceed by factoring all occurrences of i , j . We repeat this process until H is fully factorized, i.e., ∀ h ∈ H : (cid:107) h (cid:107) =

1. To improve re-use ofthe nodes thus generated, we use scalar quantization on the ratios ofthe linear interpolations. Note in this context that linear interpolationratios cannot be 0 or 1, since this would exactly replicate a previousmixture. Consequently, our scalar quantizer partitions the open range ( , ) into 2 b bins, where b is the bitrate of the quantization. Each binis represented by a single ﬂoating point codeword that best representsthe weights in the bin. In addition, we store an empirical standard devi-ation σ for each bin to facilitate estimating the error of the quantizationthroughout the entire reconstruction process. Since the diversity of in-terpolation weights is greatly reduced by the quantization step, morenodes in the Mixture Graph become identical and can be re-used.The ﬁnal output of the compression step is a bitstream storing thecompressed Mixture Graph. Sources are not stored explicitly sincethey correspond to input segmentation IDs. The number of sources,however, is stored in the bitstream’s header, along with the volume’sdimensions, a ﬂag indicating whether the input volume contained afractional segmentation (i.e., input segments are already mixtures), thebitrate of the scalar quantizer, etc. Table 1 provides an overview ofthe output format. For each node, we store a triple of values con-sisting of two node IDs and one index into the scalar quantizer’sTable 1: Overview of the output bitstream.ID description [ ,..., − ] (2) fractional input ? 1bit ∃ h ∈ H ( · , ) : (cid:107) h (cid:107) > [ ,..., ] (4) σ max σ of quantization(5) 3D resolution 3 × H ( · , ) (6) quantization bitrate 4bits [ ,..., ] (7) H (8) i min ( l ) var bits × (3) + (6)(C) scalar quantizer codebook 20bits for each weight and σ o appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020). codebook of interpolation weights. This is done at a ﬁxed bitrate of2 (cid:100) log ( ) (cid:101) + b . For each level of H , we compute the minimumand maximum ID of all nodes referenced in this level, i.e., i min ( l ) : = n : [ h ] n (cid:54) = ∧ [ h ] k = ∀ h ∈ H , k < n , i max ( l ) : = max h ∈ H ( · , l ) (cid:98) k h . (15)We store i min ( l ) in the bitstream’s header for each l . For each voxel,we then store its node ID minus the minimum node ID of that level,again, using a ﬁxed bitrate of r ( l ) = (cid:100) log ( i max ( l ) − i min ( l ) + ) (cid:101) (16)for each voxel in this level. Finally, we store the scalar quantizer’scode values and standard deviations in 20 bits each. In order to reconstruct the original normalized histogram mipmap, weﬁrst perform a topological sorting of the nodes Λ in the Mixture Graph.Despite the factorization step resulting in a “soft” topological order(children-before-parents), this step is necessary in order to establishsynchronization points for parallel reconstruction. Topological sortingresolves dependencies when reconstructing the original mixtures (alsosee Fig. 5). We begin by assigning a value of 0 to each source in thegraph. Each remaining node with predecessor values v , v is assigneda value of max ( v , v )+

1. We call these values v i the topological level of node i and max v i the topological depth of the graph. Exploitingthe children-before-parents relation described above, this amounts toa linear scan of all nodes.On parallel architectures, such as GPUs, nodes with identicalvalues can be processed without synchronization. Conversely, thisalso means that the topological depth is equivalent to the requirednumber of synchronization barriers. Furthermore, since we only needto ascertain that all nodes with smaller values are processed before theﬁrst node with a larger value, we can defer evaluating certain nodes.Consider a node with topological level v and a successor level of v s .If v s − v >

1, then the node can be evaluated in parallel together withnode levels v , v + ,..., v s −

1. Since in our experience much morenodes are generated with small values than with large ones, this can beused for load-balancing. We start by processing all nodes with a valueof 0 and a successor value of 1. We then continue processing nodeswith value of 0 but a successor value of 2, until we have processed atleast (cid:100) N / p (cid:101) nodes, where N is the total number of nodes and p is thetopological depth of the DAG. We then repeat this procedure for thenext level, until we reached the sinks in the DAG.The Mixture Graph not only stores a factorization of the mixturesin H , but also represents the computations necessary to reconstructmixtures of segment attributes at all scales of a mipmap. Therefore,it is also possible to assign each segment an RGB α color and rebuildthe resulting RGB α mipmap efﬁciently. Traditionally, this requires to Fig. 5: Topological sorting of the DAG. Node numbers indicate thedistance to the closest source. During parallel reconstruction smallernumbers have to be processed before larger numbers. The largestnumber corresponds sto the number of synchronization steps required.Here, the orange node has only successors with a value of 3. It can beprocessed together with nodes labelled ’1’ or with nodes labelled ’2’. assign colors ﬁrst and then to rebuild the mipmap from scratch. Us-ing the Mixture Graph, only all potentially unique RGB α colors inthe resulting mipmap are computed by propagating colors through theMixture Graph. The mipmap itself is essentially a paletted texture. Inthe Results section we show that the 1,046 megavoxel Hippocampusdata set with 353 segments and a mipmap comprising more than 1,195million voxels can be factored into a little more than 1 million nodes.Consequently, instead of re-computing the color values of 1,195 mil-lion voxels, we only need to update 1 million colors by linear interpo-lations. A partial update of the transfer function would trigger a “push”scattering propagation of through the graph. Since scattering is muchharder to parallelize than gathering, our system only performs full up-dates using parallel gathering. As we show in the Results section,the performance of such updates is still well within real-time require-ments. Note that, apart from other beneﬁts, assigning an RGB α valueto each voxel of the aforementioned data set would amount to morethan 4.6GB, whereas our data representation requires only 1.36GB.Finally, we would like to point out that, as long as the input datacontains only one segment ID per voxel, we can always reconstruct thelowest level (input resolution) without loss from the Mixture Graph.Higher-level mixtures are encoded with a small loss, and we wouldlike to refer the reader to the Results section for details. To estimate the error in the reconstruction, the scalar quantizer storesthe empiric standard deviation in addition to the code value per bin.The quantization error can be propagated following the standard rulesfor error propagation during the reconstruction: Q = a + b → δ Q = (cid:113) ( δ a ) + ( δ b ) Q = a × b → δ Q = (cid:118)(cid:117)(cid:117)(cid:116) Q (cid:32)(cid:18) δ aa (cid:19) + (cid:18) δ bb (cid:19) (cid:33) . (17)Sources in the Mixture Graph represent certain input segment IDs (cid:98) e i , δ (cid:98) e i =

0. Successive linear interpolations in the reconstruction usinguncertain weights introduce uncertainties into the mixtures of the inputsegments. In particular, we consider the linear interpolation betweentwo potentially uncertain vectors a ± δ a , b ± δ b using an uncertainweight w ± δ w . Since a , b model mixtures of segments, we treat theerror of each component i in these vectors separately. We obtain [ µ ] i = ( − w )[ a ] i + w [ b ] i [ δ µ ] i = (cid:118)(cid:117)(cid:117)(cid:116) [ µ ] i (cid:32)(cid:18) [ δ a ] i [ a ] i (cid:19) + (cid:18) [ δ b ] i [ b ] i (cid:19) + (cid:18) δ ww (cid:19) (cid:33) . (18)Despite the fact that segments are typically spatially coherent, it isworth noting that the above independent treatment of the error prop-agation is correct since the uncertain weights are independent of themixture vectors. We also validated this experimentally by samplingweights from standard distributions during the reconstruction. Theabove error propagation equations provide us with the means to quan-tify the error of volume region queries. To compute the histogram over a sub-volume [ x , x + ∆ x ] × [ y , y + ∆ y ] × [ z , z + ∆ z ] , we follow a footprint assembly [44, 43] approach. First, wedecompose the 1D ranges along x , y and z according to their alignmentwith the mip levels in the normalized histogram mipmap H . Then,the tensor product of these ranges is comprised of rectangular boxesas depicted in Fig. 6. Each of these boxes is tesselated by cubes of aside length l equal to the shortest of the three box extents. Each cuberepresents a single sample from H at level log ( l ) . The contributionof each sample to the ﬁnal result is denormalized (multiplication by l ) before being accumulated. In order to evaluate samples from H efﬁciently, a lookup table is pre-computed on the CPU that stores onemixture per node. We then use error propagation (Eq. 18) to track the5 l x l y Fig. 6: Footprint assembly in 2D.

Left:

The 2D footprint formed bythe tensor product of 1D footprints consists of rectangles. Alignmentsand extents of samples of the mimmap are shown next to each axis.

Right : Rectangles in the 2d footprint are tesselated using squaresmatching the smaller side of each rectangle. Each square correspondsto one sample from the normalized histogram mipmap.uncertainty of the interpolation weights through the lookup table. Thisallows us to predict the error made in range queries. Typical errorsusing 512 bins for quantization are about 0.3%, with the maximumerror for a segment not exceeding 0.5%. As a direct beneﬁt of theMixture Graph only considering convex combinations of nodes, thesum of the expectations always sum up exactly to the volume of thequery region. Our error estimate is thus unbiased.The task of querying ranges of the segmented volume is not wellsuited for the GPU, since the number of non-zero entries in the sparsemixtures is hard to predict. While it is generally possible to na¨ıvelyreconstruct a dense histogram, a dense approach is unlikely to scalewell for volumes with a high number of segment IDs, since the sparsityin the range is not used. Furthermore, while we only discussed axis-aligned sub-volumes in this section, footprint assembly can also beperformed for “oddly-shaped” regions—albeit at a substantially highercomputational cost.

Maximum Search.

Our compression algorithm relies on quicklyﬁnding the best pair of indices i , j with respect to one of the greedystrategies deﬁned in Eq. (12) and Eq. (14). We note that at ﬁrst thebest and second-best pair tend to be well-separated in terms of theirscore φ or φ . In later stages, progress slows down as choices becomemore ambiguous. We therefore introduce a threshold τ . In the ﬁrststage, we factorize all mixtures m , (cid:107) m (cid:107) > τ , skipping over manyentries in lower levels of the mipmap. Once we have successfullyexhausted options in this search space, we factorize the remainingmixtures with only τ or less components in scanline order. Using acutoff τ =

3, about half of all mixtures can be exempted in the ﬁrststage, resulting in about 2 . × speed increase when compared to ana¨ıve exhaustive search when φ is used as a scoring function, and inabout 1 . × improvement for φ . The reason is that φ reduces thesearch space quicker and results in a better performance overall. Interpolation Symmetries.

To ﬁnd node redundancies, we considertriples ( i , j , w ) to denote linear interpolations between segments s i and s j : ( − w ) s i + ws j . Thus, each triple ( i , j , w ) is equivalent to ( j , i , − w ) . In our implementation, we use this symmetry to improvethe accuracy of the quantized weight. To do so, we quantize both w and 1 − w using the codebook generated by the scalar quantizer andkeep the one resulting in the lower error. We then store either ( i , j , w ) or ( j , i , − w ) in the output stream. Sparse vectors over R ∞ . While it may seem appealing to implementthe data structure for sparse vectors over R ∞ using a generic hash im-plementation (e.g., std::unordered map in C++), we found that thestorage requirements of such implementations quickly become pro-hibitive. We therefore implemented the underlying data structure in C++ without the STL as a vector of pairs (index, value) that we keepsorted with respect to the index. ENDERING

To render the segmented volume directly from its Mixture Graph rep-resentation, we upload the entire binary representation to the GPU.Speciﬁcally, voxel data is stored in shader storage buffer objects (SS-BOs); index structures such as node offsets, bit allocation counts etc.are stored as shader uniform constants. In addition, we allocate anSSBO with one vec4 color entry per node to hold the decoded colorlookup table. The lookup table is updated with each transfer functionupdate using a compute shader, as outlined before. Since our binaryrepresentation is accessed based on bit addresses, we use the uint64data type provided to shaders by the GPU shader5 extension whenevernecessary. For each voxel, we ultimately retrieve some index informa-tion (level node offset, bit allocation per voxel, etc.) plus the node IDof the voxel. The node ID is a direct reference into the color lookuptable. In essence, rendering from the Mixture Graph is the same asrendering from a paletted texture, with the minor complications thatthe palette has to be reconstructed prior to rendering and address com-putations have to be performed in the shader.While the Mixture Graph allows the efﬁcient computation of anykind of numeric segmentation attribute, we shall for now assumethat each segment is assigned an RGB α color through a transferfunction. On the GPU, we traverse the Mixture Graph’s nodes intopological order. For each node, we gather two colors and linearlyinterpolate between them using the quantized weight stored at thenode. Once the palette is computed, rendering proceeds by identifyingthe address of any given voxel’s raw data in a buffer containing thevoxel IDs. We provide the number of levels as a uniform shaderconstant. Note that while the Mixture Graph stores a full mipmap andthus provides pre-ﬁltered anti-aliasing, we still have to perform thecomputation of the mip level as well as the interpolation in the shader“manually”. This is a drawback shared with virtually all modernpaletted compression and rendering methods. To estimate the correctlevel of detail (LOD), we use the observation that the pixel coverageof a voxel is linear in viewspace depth z [9]. The correspondingequations can also be found in Section 8.14 of the OpenGL 4.5speciﬁcations. We therefore upload viewport, camera parameters,volume dimensions as well as a uniform LOD bias to the shader. Inaddition, the step size of a volume raymarcher can be adjusted to theLOD, which requires opacity correction to be performed in case ofsemi-transparent structures. Empty Space Skipping.

The hierarchical nature of the MixtureGraph makes it possible to implement various empty space skippingschemes. Although a full evaluation of different empty space skippingmethods is beyond the scope of this paper, our current raymarcherimplements a GPU-based scheme that: (i) assigns opacity values ofthe current transfer function to the leafs, (ii) propagates these valuesthrough the Mixture Graph, and, (iii) uses the hierarchy for skippinglonger segments. As long as the opacity value in parent voxels fallsbelow a certain threshold, step (iii) traverses the hierarchy upwards toestablish larger and larger blocks that can be skipped. This simple yetﬂexible scheme is enabled by the Mixture Graph’s ability to quicklyrecompute entire mipmaps. It results in an average 3 × speedup whencompared to the same implementation without mipmaps using a verylow threshold of 10 − .Fig. 7: Using on-the-ﬂy opacity gradients with local ambient occlu-sion (left) vs. pre-computed normals stored in a second Mixture Graph(right) for shading the Polymer data set.6 o appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020). Lighting.

Since our input are volumes of categorical labels, the gradi-ent of the volume cannot be used as normal for shading [26]. Instead,we estimate normals as follows. We start by tagging voxels at theboundary between two segments as features and compute an unsigneddistance transform to these features using the method of Schneider etal. [45]. We erode our feature mask by two voxels and smooth the dis-tance transform inside a thin band around the segment surfaces. Thisis necessary to compensate for artifacts due to the ﬁnite voxel resolu-tion. We then compute gradients using centered differences, but theresulting gradients will be discontinuous across ridges of the medialaxis. To alleviate this, we ﬁnally ﬂip the feature mask and smooth theremaining gradients in the interior of the segments using isotropic dif-fusion. Once we have normal estimates, we quantize the normals [48]using vector quantization to 9 bit. To alleviate banding effects, we use3D error diffusion [28]. This results in yet another volume of categor-ical values that we store in another Mixture Graph. Since the leafs ofthis Mixture Graph store normals, we can evaluate a shading modelat each leaf and propagate the resulting light contributions for diffuseand specular component through the graph to obtain a lookup table foreach voxel, across each LOD. Note that this also results in properlypre-ﬁltered lighting contributions, since we do not average normals(as is traditionally done) but light contributions. We would also liketo note that estimating smooth normals from voxel-based segmenta-tions is challenging; even more so since data sets from neurosciencestypically have strongly anisotropic voxels (up to 1:1:7 for the Neocor-tex). Fig. 7 shows the two lighting modes currently supported by ourrenderer: on-the-ﬂy computed opacity gradients using discrete differ-ences paired with local ambient occlusion [19] (3-rays, left) and quan-tized, pre-computed normals stored in a second Mixture Graph (right).

ESULTS

Our benchmark conﬁguration comprises a i9-9820X CPU clocked at3.30GHz and 64GB of RAM, running Windows 10. Although the CPUhas 10 cores, our current factorization is mostly sequential and doesnot utilize the GPU, an NVIDIA RTX Titan with 24GB VRAM.

Table 2: Mixture graph results for the Hippocampus [7], Neocor-tex [24], and Polymer [55] data sets using the max-reduction criterion.Voxel numbers reported include voxels in higher mip-levels.

Input Hippocampus Neocortex Polymer resolution 1178 × ×

789 2048 ×

300 614 × × Histo Mipmap Hippocampus Neocortex Polymer non-zeros/voxel 1.006 1.025 1.021raw size 9.00 GB 11.10 GB 3.10GBinput node pairs 4,976,378 23,551,565 184,642,703

Mixture Graph Hippocampus Neocortex Polymer nodes 1,034,963 5,202,070 1,930,257size 1.47 GB 2.14 GB 0.707 GBbits/voxel 11.92 14.52 17.16quant. bins 512 512 512max error 0.00488 0.00523 0.00493topol. depth 11 22 26encoding time 3.51 min 2.05 h 82.3 h

Table 2 lists our results for three test data sets. The Hippocampusdata set is provided by Cal`ı [7]. It comprises about 1,045 million vox-els stemming from electron microscopy imaging, partitioned into 353segments. Building the normalized histogram mipmap of this data setresults in 1.006 non-zero entries in the mixtures per voxel. Storing thehistogram requires 9.00 GB. Subsequent factorization of the normal-ized histogram mipmap into the Mixture Graph results in slightly more then 1 million nodes and 1.47 GB of raw data. This corresponds to abitrate of 11.92 bits per input voxel, not considering additional voxelsin the mipmap hierarchy.The second data set is a section of a Neocortex data set [24]. It wasimaged using electron microscopy as well, but contains a much densersegmentation that uses signiﬁcantly more IDs. Consequently, the totalnumber of node pairs (i.e., the search space for the factorization) inthe input is substantially larger. This results in a signiﬁcantly longerprocessing time.The third data set is a micro CT scan of a ﬁber-reinforcedPolymer [55]. Its high segmentation count poses a computationalchallenge for our current greedy implementation that we plan toaddress in the near future by relaxing the greed of the algorithmin favour of faster heuristics. Furthermore, spatially independentfactorizations using data windows could lead to increased parallelism.On the other hand, the data set also shows the greatest potential for theMixture Graph, since we are able to reduce the initial set of over 184.6million node pairs to just under 2 million, while still maintainingvirtually lossless reconstruction.We quantify the error introduced by our method as the maximum (cid:96) distance between the original H and a full reconstruction fromthe Mixture Graph. We obtain the reconstruction by assigning unitvectors (cid:98) e i to leafs i followed by propagation. Errors are thus relativeto a unit of 1 and dimensionless. The reason for that choice is thatall mixtures are (cid:96) -normalized. Note that, unlike the error estimatediscussed in Section 3.5, the error listed in Table 2 (max error) is anactual error based on a encoding-decoding round trip. In contrast, theerror estimate quantiﬁes the uncertainty in each ID as required forvolumetric queries.In addition to the number of nodes in the Mixture Graph (includingsources and sinks), we also list the topological depth of the graph,since it is equivalent to the number of synchronization steps duringparallel reconstruction. We provide timings for all data sets includinggeneration of the histogram mipmap and the factorization into theMixture Graph.For all data sets, we observe a signiﬁcant reduction in the com-putational workload required to propagate transfer function updatesthrough the entire mipmap. This is reﬂected by the ratio betweenvoxels in the input mipmap (Hippocampus: 1,195M, Neocortex:1,438M, Polymer: 404.7M) and the generated nodes (Hippocampus:1M, Neocortex: 5.2M, Polymer: 1.9M). While the number ofgenerated nodes depends on the number of possible pairs to choosefrom (Hippocampus: 5M, Neocortex: 23.6M, Polymer: 180.6M), italso depends on the number and location of segments. It is thereforeno surprise that the Polymer data with its many ﬁbrous segmentsshows the best reduction since mixtures involving many segments arequite rare. In any case, the reduction is substantial ( > Max-occurrence criterion.

Both the max-occurrence criterion(Eq. 12) and the max-reduction criterion (Eq. (14)) result in asimilar result regarding output size and max error. However, themax-reduction criterion is signiﬁcantly faster in all three cases(around 2 . × for the Hippocampus and around 2 . × for Neocortexand Polymer), which we attribute to the fact that it prunes the searchspace quicker. However, to our surprise, the max-reduction criterionalso results in a slightly better error, albeit negligibly so. Themax-reduction criterion also generates a negligibly ( ≤ Quantization parameter.

We also conduct experiments to assess theimpact of the scalar quantization stage on the overall performanceof our method. Our ﬁndings are summarized in Table 3. Theerror reported there is obtained by taking the maximum of the each7able 3: Impact of the bitrate of the scalar quantizer.

Hippocampus rate/bits nodes size time error4 460,210 1.447 GB 2.21 min 0.174755 655,225 1.467 GB 2.51 min 0.093846 809,324 1.469 GB 2.75 min 0.042137 898,406 1.469 GB 2.92 min 0.025258 968,962 1.469 GB 3.21 min 0.012289 1,029,188 1.470 GB 3.51 min 0.00488

Neocortex rate/bits nodes size time error4 3,286,086 2.102 GB 90.25 min 0.237485 3,980,659 2.107 GB 95.36 min 0.102866 4,441,354 2.132 GB 102.70 min 0.052727 4,758,860 2.134 GB 106.37 min 0.026398 5,038,103 2.136 GB 116.26 min 0.012889 5,300,258 2.139 GB 127.20 min 0.00523

Polymer rate/bits nodes size time error4 1,071,148 713.6 MB 71.6 h 0.182485 1,315,833 718.3 MB 73.2 h 0.076036 1,519,865 721.5 MB 76.6 h 0.040667 1,684,252 722.6 MB 79.5 h 0.019328 1,812,057 723.1 MB 80.7 h 0.010019 1,930,257 724.0 MB 82.3 h 0.00493 quantization bin’s individual rmse. Since quantization bins storeinterpolation weights, the error is in [ , ] . Lower bitrates in thescalar quantization stage lead to higher ambiguities between nodes,and, thus, to substantially fewer nodes. To see this, consider that twointerpolations λ ( i , j , w ) , λ ( i , j , w (cid:48) ) are merged if the quantization binof w is identical to that of w (cid:48) . Fewer nodes translate to faster encodingand reconstruction times. The cost to pay is that the reconstructionerror roughly doubles for each reduction by one bit. The reasonis that each such reduction reduces the precision of the quantizedinterpolation weight by half. Seemingly surprising, the output sizestays more or less constant. This is a consequence of the fact thatthe bulk of the output size is allotted to voxel data at the ﬁnestlevel. Interestingly, the encoding time decreases a little faster than thenumber of nodes. We attribute this to our observation that, irrespectiveof the bitrate, similar choices are made in the early stages of ourgreedy algorithm. That means that for any bitrate, the search space isreduced by roughly the same absolute amount early on, resulting infewer factorization steps at lower bitrates. The CPU in our benchmark conﬁguration supports bit manipulationinstructions for counting leading and trailing zeros that result in aneasier implementation of the footprint gathering algorithm.Fig. 8: The Mixture Graph allows for real-time transfer function edit-ing across all scales of the mipmap, and can be used for multidimen-sional color mapping of different segment attributes. These schemescan be used for various real time visual analysis applications in neuro-science (left), or material science (center and right).

Color lookup table.

We evaluate the time it takes to assemble a colorlookup table on both CPU and GPU. To do so, we assign colors tothe source leaf nodes in the DAG and traverse all remaining nodes inparallel. For each node, we fetch data from its children and performone interpolation. We measure the execution time of 1,000 repetitionsand report the average time of one repetition and GPU timings include the data upload of one ﬂoat4 color per source. On the CPU, we usedOpenMP to parallelize the code, whereas on the GPU, a computeshader is used. This code is executed whenever the user changes thecolor transfer function such as shown in Fig. 8. For the Hippocampusdata set (353 sources, topological depth 11) our code takes 22.7mson the CPU and 3.5ms on the GPU for computing color informationfor the more than 1M nodes. For the Neocortex (1,182 sources,topological depth 22) our code takes 129.7ms (CPU) and 11.3ms(GPU). Finally, for the polymer (15,917 sources, topological depth26) our code takes 57.0ms (CPU) and 12.1ms (GPU). These timingsindicate that the limiting factor on the CPU is the number of nodeswhereas on the GPU the need to synchronize affects computationtimes. Enabled by this performance, we implement several interactivemultidimensional transfer function editing schemes that can be easilyintegrated into a wide range of visual exploration workﬂows. Forexample, neuroscientists can classify and select neural structuresaccording to geometric features such as shape, volume, length anddiameter (Fig. 8 left), and material scientists can cluster ﬁbers intogroups based on length or orientation (Fig. 8 middle and right).

Mixture lookup table.

We also bench the time it takes to assemble amixture lookup table on the CPU. We use this lookup table for rangequeries (both na¨ıve and our footprint assembly algorithm), such asdepicted in Fig.9. We start by assigning a mixture (cid:98) e i with variance0 to each leaf node i . In the same fashion as for the color lookup,we propagate this information through the DAG, including the errordescribed in Eq. (18). Each node thus stores two sparse vectors over R ∞ , the expected value µ and its variance δ µ . For the Hippocampus,computing this lookup takes 68ms, 389ms for the Neocortex, and171ms for the Polymer. Building this lookup table does not onlydepend on the number of nodes and the topological depth of the DAG,but also on the initial number of segments as well as the number ofnon-zero entries in the sparse vectors. This operation will typically beperformed only once when loading the data set. Due to the pruningperformed during the factorization, the lookup tables easily ﬁt intomain memory, with the Neocortex lookup at around 251MB being thelargest of the three. Range queries.

We then use this lookup table to measure randomly-aligned range queries covering between 16 to 256 voxels. For eachof the three data sets, we compare a parallel na¨ıve approach that addsthe contributions of each voxel at the ﬁnest level with our footprint as-sembly (FA) method that exploits the data hierarchy. Our ﬁndings aresummarized in Table 4. As can be seen, FA signiﬁcantly reduces thenumber of samples necessary to count the segment volumes in a givenrange. At ranges larger than 8 , we amortize the setup overhead ofFA over the na¨ıve method. Beyond this point, we observe substantialspeed-ups that are enabled by our hierarchical data structure.It is worth noting that the number of fetches made by the FA methoddepends solely on the domain resolution, as well as on the footprintsize and alignment. These numbers are therefore similar for the threeFig. 9: The Mixture Graph allows for real-time computation of ap-proximate histograms over axis-aligned bounding boxes. The abovesmall box contains segments with an average volume of 270 K ± . o appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020). Fig. 10:

Top, left to right:

Direct volume rendering of levels 0,2,4,5, and 6 of the 1.2G voxel Neocortex data set. The data sets strong anisotropycan be seen in the middle images.

Bottom, left to right:

Direct volume rendering of levels 0 ,...,

Hippocampus (top),

Neocortex (center), and

Polymer (bottom). Leftto right: size of the query region, fetches made by the FA algorithm,percentage of fetches saved, time in ms for FA and 10-core parallelna¨ıve approach, and speedup (bold times in seconds).time/mssize fetches saved footprint na¨ıve speed-up16 × × × × × × × × × × × × × × × data sets. In contrast, the time is also affected by the number of seg-mentation IDs and their spatial distribution. For large, contiguouslylabeled regions, the resulting mixture remains very sparse throughoutits computation, whereas for regions with less coherence, many moreterms have to be accumulated. A drawback of our method to derive pre-computed normals are arti-facts in the form of caps around the medial axis where the segmentsleave the domain. A second limitation is that we are restricted to lightand camera at inﬁnity since we compute lighting contributions withoutknowledge of position. Note that only the second is a limitation of theMixture Graph since our method is orthogonal to the normal estima-tion. Fig. 10 depicts several levels of the Hippocampus and Neocortexdata sets, all rendered at the same resolution, to illustrate the downsam-pling capabilities of our data structure. In the accompanying video, wedemonstrate real-time performance using a direct volume raymarcher,sampling at 0.5 voxels and rendering to a 1600 × In this paper, we have presented the Mixture Graph. This datastructure allows us to factorize and compress normalized histogrammipmaps, essentially resulting in a paletted mipmap that provides fastupdates of transfer functions for rendering. In our experiments, weobserve a decrease from the order of gigavoxels in the mipmap downto the order of mega-nodes in the Mixture Graph. Both quantitiesreﬂect the amount of computational work required to re-compute themipmap. We demonstrate the usefulness of our method for segmentedvolumes, whose integer nature otherwise prohibit direct ﬁltering,interpolation, and lossy compression. Additionally, we evaluate andpresent various trade-offs between encoding speed and reconstructionﬁdelity to guide future users of our method.Our current implementation considers a single volume and buildsthe Mixture Graph using serial code. In the future, we would like toexplore bricked volumes, since they offer potential for parallelizationand out-of-core processing. However, building the Mixture Graph ofa bricked volume in parallel is not straightforward. The reason is thateach factorization step alters the remaining search space and, conse-quently, the order of execution may matter signiﬁcantly. Another di-rection for future research is to consider larger mixtures (e.g., barycen-tric interpolation etc.) as the building blocks of the graph. While eachmixture would require more storage, we expect that fewer nodes willbe generated. Also, the longest path in the graph is expected to becomeshorter, which would require fewer synchronization operations duringparallel transfer function updates. Our current empty space skippingmethod is efﬁcient but we also plan to investigate how more sophis-ticated schemes [15] can be combined with out data structure in thefuture. Finally, we demonstrate that the choice of the greedy scoringfunction has signiﬁcant impact on the performance of the construc-tion of the mixture graph. In this context, more research is needed todetermine better scoring functions. A CKNOWLEDGMENTS

All authors are funded by the College of Science and Engineering(CSE) at Hamad Bin Khalifa University (HBKU). The data sets usedin this work were generated by Cal`ı et al. [7] (Hippocampus) andKasthuri et al. [24] (Neocortex). The ﬁber-reinforced polymer dataset is provided by Christoph Heinzl [55] .9

EFERENCES [1] A. K. Al-Awami, J. Beyer, D. Haehn, N. Kasthuri, J. W. Lichtmann,H. Pﬁster, and M. Hadwiger. NeuroBlocks - visual tracking of segmen-tation and proofreading for large connectomics projects.

IEEE Trans-actions on Visualization and Computer Graphics , 22(1):738–746, 2015. https://doi.org/10.1109/TVCG.2015.2467441 .[2] A. K. Al-Awami, J. Beyer, H. Strobelt, N. Kasthuri, J. W. Lichtmann,H. Pﬁster, and M. Hadwiger. NeuroLines: A subway map metaphor forvisualizing nanoscale neuronal connectivity.

IEEE Transactions on Vi-sualization and Computer Graphics , 20(12):2369–2378, 2014. https://doi.org/10.1109/TVCG.2014.2346312 .[3] C. Bajaj, I. Ihm, and S. Park. 3D RGB image compression for interac-tive applications.

ACM Transactions on Graphics , 20(1):10–38, 2001. https://doi.org/10.1109/PacificVis.2014.52 .[4] J. Beyer, A. K. Al-Awami, N. Kasthuri, J. W. Lichtmann, H. Pﬁster,and M. Hadwiger. ConnectomeExplorer: Query-guided visual analy-sis of large volumetric neuroscience data.

IEEE Transactions on Visu-alization and Computer Graphics , 19(12):2868–2877, 2013. https://doi.org/10.1109/TVCG.2013.142 .[5] J. Beyer, M. Hadwiger, A. K. Al-Awami, W.-K. Jeong, N. Kasthuri, J. W.Lichtmann, and H. Pﬁster. Exploring the connectome: Petascale vol-ume visualization of microscopy data streams.

IEEE Computer Graph-ics and Applications , 33(4):50–61, 2013. https://doi.org/10.1109/MCG.2013.55 .[6] J. Beyer, H. Mohammed, M. Agus, A. K. Awami, H. Pﬁster, and M. Had-wiger. Culling for extreme-scale segmentation volumes: A hybrid deter-ministic and probabilistic approach.

IEEE Transactions on Visualizationand Computer Graphics , 25:1132–1141, 2019. https://doi.org/10.1109/TVCG.2018.2864847 .[7] C. Cal`ı, J. Baghabra, D. Boges, G. Holst, A. Kreshuk, F. Hamprecht,M. Srinivasan, H. Lehv¨aslaiho, and P. Magistretti. Three-dimensionalimmersive virtual reality for studying cellular compartments in 3D mod-els from EM preparations of neural tissues.

Computational Neurology ,524(1):23–38, 2016. https://doi.org/10.1002/cne.23852 .[8] A. Cohen, I. Daubechies, and J.-C. Feauveau. Biorthogonal bases ofcompactly supported wavelets.

Communications on Pure and AppliedMathematics , 45(5):485–560, 1992. https://doi.org/10.1002/cpa.3160450502 .[9] J. Cohen, M. Olano, and D. Manocha. Appearance-preserving simpli-ﬁcation. In

Proceedings of ACM SIGGRAPH , pages 115–112, 1998. https://doi.org/10.1145/280814.280832 .[10] I. Daubechies and W. Sweldens. Factoring wavelet transforms into liftingsteps.

Journal on Fourier Analysis and Applications , 4(3):247–269, 1998. hhtps://doi.org/10.1007/BF02476026 .[11] W. Ewald and W. Sieg, editors.

David Hilbert’s Lectures on the Foun-dations of Arithmetics and Logic 1917–1933 . Springer Verlag, 2013. .[12] N. Fout and K.-L. Ma. Transform coding for hardware-acceleratedvolume rendering.

IEEE Transactions on Visualization and ComputerGraphics , 13(6):1600–1607, 2007. https://doi.org/110.1109/TVCG.2007.70516 .[13] S. Funasaka, K. Nakano, and Y. Ito. Adaptive loss-less data com-pression method optimized for GPU decompression.

Concurrency andComputation Practice and Experience , 24(24):1–15, 2010. https://doi.org/10.1002/cpe .[14] A. Gersho and R. M. Gray.

Vector Quantization and Signal Compression .Kluwer Academic Press, 1992. https://link.springer.com/book/10.1007/978-1-4615-3626-0 .[15] M. Hadwiger, A. K. Al-Awami, J. Beyer, M. Agus, and H. Pﬁster.SparseLeap: Efﬁcient empty space skipping for large-scale volume ren-dering.

IEEE Transactions on Visualization and Computer Graph-ics , 24(1):974–983, 2018. https://doi.org/10.1109/TVCG.2017.2744238 .[16] M. Hadwiger, C. Berger, and H. Hauser. High-quality two-level volumerendering of segmented data sets on consumer graphics hardware. In

IEEE Visualization , pages 301–308, 2003. https://doi.org/10.1109/VISUAL.2003.1250386 .[17] D. Haehn, S. Knowles-Barley, M. Roberts, J. Beyer, N. Kasthuri, J. W.Lichtmann, and H. Pﬁster. Design and evaluation of interactive proof-reading tools for connectomics.

IEEE Transactions on Visualization andComputer Graphics , 20(12):2466–2475, 2014. https://doi.org/10.1109/TVCG.2014.2346371 . [18] H. Hauser, L. Mroz, G. Bischi, and E. Gr¨oller. Two-level volumerendering.

IEEE Transactions on Visualization and Computer Graph-ics , 7(3):242–252, 2001. https://doi.org/10.1109/2945.942692 .[19] F. Hernell, P. Ljung, and A. Ynnermann. Local ambient occlusionin direct volume rendering.

IEEE Transactions on Visualization andComputer Graphics , 16(4):548–559, 2010. https://doi.org/10.1109/TVCG.2009.45 .[20] I. Ihm and S. Park. Waveletbased 3D compression scheme for interac-tive visualization of very large volume data.

Computer Graphics Forum ,18(1):3–15, 1999. https://doi.org/10.1111/1467-8659.00298 .[21] W.-K. Jeong, J. Beyer, M. Hadwiger, R. Blue, C. Law, A. V´azquez-Reina,R. C. Reid, J. Lichtmann, and H. Pﬁster. Ssecrett and NeuroTrace: Inter-active visualization and analysis tools for large-scale neuroscience datasets.

IEEE Computer Graphics and Applications , 30(3):58–70, 2010. https://doi.org/10.1109/MCG.2010.56 .[22] W.-K. Jeong, J. Beyer, M. Hadwiger, A. V´azquez, H. Pﬁster, and R. T.Whitaker. Scalable and interactive segmentation and visualization of neu-ral processed in EM datasets.

IEEE Transactions on Visualization andComputer Graphics , 15(6):1505–1514, 2009. https://doi.org/10.1109/TVCG.2009.178 .[23] W.-K. Jeong, J. Schneider, S. Turney, B. E. Faulkner-Jones, D. Meyer,R. Westermann, R. C. Reid, J. Lichtmann, and H. Pﬁster. Interactivehistology of large-scale biomedical image stacks.

IEEE Transactions onVisualization and Computer Graphics , 16(6):1386–1395, 2010. https://doi.org/10.1109/TVCG.2010.168 .[24] N. Kasthuri, K. Hayworth, D. Berger, R. Schalek, J. Conchello,S. Knowles-Barley, D. Lee, A. V´azquez-Reina, V. Kaynig, T. Jones,M. Roberts, J. Morgan, J. Tapia, H. Seung, W. Roncal, J. Vogelstein,R. Burns, D. Sussman, C. Priebe, H. Pﬁster, and J. Lichtmann. Saturatedreconstruction of a volume of neocortex.

Cell , 162(3):648–661, 2015. https://doi.org/10.1016/j.cell.2015.06.054 .[25] V. Kaynig, A. V´azquez-Reina, S. Knowles-Barley, M. Roberts, T. R.Jones, N. Kasthuri, E. Miller, J. Lichtmann, and H. Pﬁster. Large-scale automatic reconstruction of neuronal processes from electron mi-croscopy images.

Medical Image Analysis , 22(1):77–88, 2015. https://doi.org/10.1016/j.media.2015.02.001 .[26] V. Lempitsky. Surface extraction from binary volumes with higher-ordersmoothness. In

IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR) , pages 1197–1204, 2010. https://doi.org/10.1109/CVPR.2010.5539832 .[27] P. Ljung, J. Kr¨uger, E. Gr¨oller, M. Hadwiger, C. Hansen, and A. Yn-nermann. State of the art in transfer functions for direct volume ren-dering.

Computer Graphics Forum , 35(3):669–691, 2016. https://doi.org/10.1111/cgf.12934 .[28] Q. Lou and P. Stucki. Fundamentals of 3D halftoning.

SpringerLNCS , 1375:224–239, 1998. https://doi.org/10.1007/BFb0053273 .[29] H. M, J. Beyer, W.-K. Jeong, and H. Pﬁster. Interactive volume explo-ration of petascale microscopy data streams using a visualization drivenvirtual memory approach.

IEEE Transactions on Visualization and Com-puter Graphics , 18(12):2285–2294, 2012. https://doi.org/10.1109/TVCG.2012.240 .[30] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictio-naries.

IEEE Transactions on Signal Processing , 12:33973415, 1993. https://doi.org/10.1109/78.258082 .[31] S. Muraki. Volume data and wavelet transforms.

IEEE Computer Graph-ics and Applications , 13(4):50–56, 1993. https://doi.org/10.1109/38.219451 .[32] Neurodata.io. The OpenConnectome Project. , 2020. accessed 29 July 2020.[33] K. G. Nguyen and D. Saupe. Rapid high quality compression of volumedata for visualization.

Computer Graphics Forum , 20(3):49–57, 2001. https://doi.org/10.1111/1467-8659.00497 .[34] P. Ning and L. Hesselink. Vector quantization for volume rendering.In

Proceedings of the Workshop on Volume Visualization , pages 69–74,1992. https://doi.org/10.1145/147130.147152 .[35] R. Patel, Y. Zhang, J. Mak, A. Davidson, and J. Owens. Parallel loss-less data compression on the GPU. In

Innovative Parallel Computing(InPar) , pages 1–9, 2012. https://doi.org/10.1109/InPar.2012.6339599 .[36] D. Pavi´c and L. Kobbelt. Two-colored pixels.

Computer Graphics o appear in IEEE Transactions on Visualization and Computer Graphics (IEEE Vis 2020). Forum , 29(2):743–752, 2010. https://doi.org/10.1111/j.1467-8659.2009.01644.x .[37] F. Reichl, M. Treib, and R. Westermann. Visualization of big SPH sim-ulations via compressed octree grids. In

IEEE International Conferenceon Big Data , pages 71–78, 2013. https://doi.org/10.1109/BigData.2013.6691717 .[38] J. Rissanen and G. G. L. Jr. Arithmetic coding.

IBM Journal of Re-search and Development , 23(2):149–162, 1979. https://doi.org/10.1147/rd.232.0149 .[39] W. G. Roncal, M. Pekala, V. Kaynig-Fittkau, D. M. Kleissas, J. T. Vogel-stein, H. Pﬁster, R. Burns, R. J. Vogelstein, M. A. Chevillet, and G. D.Hager. VESICLE: Volumetric evaluation of synaptic interfaces usingcomputer vision at large scale. In

Proc. British Machine Vision Con-ference , pages 81.1–81.13, 2015. https://doi.org/10.5244/C.29.81 .[40] R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learningsparse dictionaries for sparse signal approximation.

IEEE Transactionson Signal Processing , 58(3):1553–1564, 2010. https://doi/org/10.1109/TSP.2009.2036477 .[41] A. Said and W. A. Pearlman. A new fast and efﬁcient image codecbased on set partitioning in hierarchical trees.

IEEE Transactions on Cir-cuits and Systems for Video Technology , 6(3):243–250, 1996. https://doi.org/10.1109/76.499834 .[42] R. Schalek, D. Lee, N. Kasthuri, A. Peleg, T. Jones, V. Kaynig,D. Haehn, H. Pﬁster, D. Cox, and J. Lichtmann. Imaging a 1mm volume of rat cortex using multibeam SEM. In Microscopy and Mi-croanalysis , pages 582–583, 2016. https://doi.org/10.1017/S1431927616003767 .[43] A. Schilling and G. Knittel. System and method for mapping tex-tures onto surfaces of computer-generated objects, May 2001. USPatent 6,236,405: https://portal.unifiedpatents.com/patents/patent/US-6236405-B1 .[44] A. Schilling, G. Knittel, and W. Strasser. Texram: A smart memoryfor texturing.

IEEE Computer Graphics and Applications , 16(3):32–41,2002. https://doi.org/0.1109/38.491183 .[45] J. Schneider, M. Kraus, and R. Westermann. GPU-based Eu-clidean distance transforms and their application to volume rendering.

Springer CCIS , 68:215–228, 2009. https://doi.org/10.1007/978-3-642-11840-1_16 .[46] J. Schneider and R. Westermann. Compression domain volume rendering.In

Proceedings of IEEE Visualization , pages 293–300, 2003. https://doi.org/10.1109/VISUAL.2003.1250385 .[47] E. Sitardi, R. Mueller, T. Kaldewey, G. Lohmann, and K. A. Ross.Massively-parallel lossless data decompression. In

International Con-ference on Parallel Processing (ICPP) , pages 242–247, 2016. https://doi.org/10.1109/ICPP.2016.35 .[48] M. Tarini, P. Cignoni, C. Rocchini, and R. Scopigno. Real time, accurate,multi-featured rendering of bump mapped surfaces.

Computer Graph-ics Forum , 19(3):119–130, 2000. https://doi.org/10.1111/1467-8659.00404 .[49] M. Treib, F. Reichl, S. Auer, and R. Westermann. Interactive editingof gigasample terrain ﬁelds.

Computer Graphics Forum , 31(2):383–392, 2012. https://doi.org/10.1111/j.1467-8659.2012.03017.x .[50] A. V´azquez-Reina, M. Gelbart, D. Huang, J. Lichtmann, E. Miller, andH. Pﬁster. Segmentation fusion for connectomics. In

IEEE InternationalConference on Computer Vision (ICCV) , pages 177–184, 2011. https://doi.org/10.1109/ICCV.2011.6126240 .[51] K.-C. Wang, K. Lu, T.-H. Wei, N. Shareef, and H.-W. Shen. Statisticalvisualization and analysis of large data using a value-based spatial distri-bution. In

IEEE Paciﬁc Visualization Symposium , pages 161–170, 2017. https://doi.org/10.1109/PACIFICVIS.2017.8031590 .[52] K.-C. Wang, N. Shareef, and H.-W. Shen. Image and distribution basedvolume rendering for large data sets. In

IEEE Paciﬁc VisualizationSymposium , pages 26–35, 2018. https://doi.org/10.1109/PacificVis.2018.00013 .[53] A. Weißenberger and B. Schmidt. Massively parallel Huffman decodingon GPUs. In

International Conference on Parallel Processing (ICPP) ,pages 27.1–27.10, 2018. https://doi.org/10.1145/3225058.3225076 .[54] A. Weißenberger and B. Schmidt. Massively parallel ANS decoding onGPUs. In

International Conference on Parallel Processing (ICPP) , pages100.1–100.10, 2019. https://doi.org/10.1145/3337821. 3337888 .[55] J. Weissenb¨ock, A. Amirkhanov, W. Li, A. Reh, A. Amirkhanov,E. Gr¨oller, J. Kastner, and C. Heinzl. FiberScout: An interactive toolfor exploring and analyzing ﬁber reinforced polymers. In

IEEE PaciﬁcVisualization Symposium , pages 154–160, 2014. https://doi.org/10.1109/PacificVis.2014.52 .[56] L. Williams. Pyramidal parametrics. In

Proceedings of ACMSIGGRAPH , pages 1–11, 1983. https://doi.org/10.1145/964967.801126 ..