[PDF] Burning sage: Reversing the curse of dimensionality in the visualization of high-dimensional data

Abstract

In high-dimensional data analysis the curse of dimensionality reasons that points tend to be far away from the center of the distribution and on the edge of high-dimensional space. Contrary to this, is that projected data tends to clump at the center. This gives a sense that any structure near the center of the projection is obscured, whether this is true or not. A transformation to reverse the curse, is defined in this paper, which uses radial transformations on the projected data. It is integrated seamlessly into the grand tour algorithm, and we have called it a burning sage tour, to indicate that it reverses the curse. The work is implemented into the tourr package in R. Several case studies are included that show how the sage visualizations enhance exploratory clustering and classification problems.

Full PDF

AARTICLE

Burning sage: Reversing the curse of dimensionality in thevisualization of high-dimensional data

Ursula Laa a,b , Dianne Cook b , Stuart Lee b,c a School of Physics and Astronomy, Monash University; b Department of Econometrics andBusiness Statistics, Monash University; c Molecular Medicine Division, Walter and Eliza HallInstitute, Parkville, Australia

ARTICLE HISTORY

Compiled September 24, 2020

ABSTRACT

In high-dimensional data analysis the curse of dimensionality reasons that pointstend to be far away from the center of the distribution and on the edge of high-dimensional space. Contrary to this, is that projected data tends to clump at thecenter. This gives a sense that any structure near the center of the projection isobscured, whether this is true or not. A transformation to reverse the curse, isdeﬁned in this paper, which uses radial transformations on the projected data. Itis integrated seamlessly into the grand tour algorithm, and we have called it aburning sage tour, to indicate that it reverses the curse. The work is implementedinto the tourr package in R. Several case studies are included that show how thesage visualizations enhance exploratory clustering and classiﬁcation problems.

KEYWORDS data visualisation; grand tour; statistical computing; statistical graphics;multivariate data; dynamic graphics; data science; machine learning

1. Introduction

The term “curse of dimensionality” was originally introduced by Bellman (1961), toexpress the diﬃculty of doing optimization in high dimensions because of the expo-nential growth in space as dimension increases. A way to think about it is, that thevolume of the space grows exponentially with dimension, which makes it infeasible tosample enough points – any sample will be less densely covering the space as dimen-sion increases. The eﬀect is that most points will be far from the sample mean, on theedge of the sample space. Hall, Marron, and Neeman (2005) have shown that in theextreme case of high-dimension, low-sample size data, observations are on the vertices

CONTACT Ursula Laa. Email: [email protected] , Dianne Cook. Email: [email protected] , StuartLee. Email: [email protected] a r X i v : . [ s t a t . C O ] S e p f a simplex.This aﬀects many aspects of data analysis: minimizing the error during model ﬁttingrelies on eﬀective optimization techniques, non-parametric modeling requires ﬁndingnearest neighbors which may be far away and sampling from high-dimensional distribu-tions is likely to have points far from the population mean. Donoho (2000) considersthe curse of dimensionality as a blessing, because the sparsity can be leveraged forcomputational eﬃciency. This is used in regularization methods, like lasso, to penal-ize model complexity. The penalty term results in shrinking (some of) the parameterestimates towards zero.Paradoxically, the curse of dimensionality inverts for dimension reduction, resultingin an excessive amount of observations near the center of the distribution. This af-fects visualizations made on low-dimensional projections, like the tour (Asimov 1985,Buja et al. (2005)). The eﬀect is described by Diaconis and Freedman (1984), thatmost low-dimensional linear projections are approximately Gaussian, with observa-tions concentrating in the center. This has motivated the development of indexes forprojection pursuit which search for departure from normality. It is also related towhat is called “data piling” in high-dimension low-sample size data (Marron, Todd,and Ahn 2007, Ahn and Marron (2010)): all observations can collapse into a singlepoint. These issues also persist with non-linear dimension reduction techniques, andare often referred to as the “crowding problem”, which methods like t-DistributedStochastic Neighbor Embedding (t-SNE) (van der Maaten and Hinton 2008) aim toalleviate. Figure 1 illustrates the crowding problem. Two-dimensional linear projec-tions of points sampled uniformly within p -dimensional hyperspheres ( p = 3 , , p increases the density concentrates in the center of theprojection.In this work we address data crowding in low-dimensional linear projections by pro-viding a reverse transformation for tour methods. Tours show interpolated sequencesof low-dimensional projections of the data. When exploring data with a tour we candiscover features that are only visible in linear combinations of variables. However, thedata crowding could hide these features. To reverse the eﬀect, we introduce a radial2 = 3 p = 10 p = 100 Figure 1.

Illustration of data crowding, using hexbin plots of two-dimensional projections of 10k pointssampled uniformly within p -dimensional hyperspheres, for p = 3 , , p increases the density concentrates near the center. transformation that magniﬁes the center of the distribution. This is called a burningsage tour, to reﬂect that the crowding caused by the curse of dimensionality is beingremoved.The paper is structured as follows. The radial transformation and its implementationis described in Section 2. Section 3 illustrates the use of the sage tour with examplesin clustering, supervised classiﬁcation and a classical needle-in-the-haystack problem.Section 4 describes possible extensions to the method.

2. Burning sage algorithm

To understand why points tend to be away from the center in the high-dimensionalspace, but crowd the center in low-dimensional projections, it is helpful to considerthe projected volume relative to high-dimensional volume. To avoid edge eﬀects and toimpose rotation invariance, we will start from the data being uniformly distributed ina hypersphere, i.e. all data points are within a speciﬁed distance from the center. Thismakes calculations more tractable than assuming a uniform distribution in hypercube(box).Figure 2 illustrates the comparison to be made, using something we can easilypicture, a 3D sphere. Projecting the data from within a 3D sphere to 2D (grey disk)will result in mass being condensed into the disk. Imagine comparing the volume of acylinder at diﬀerent locations in the disk. A centered cylinder has more volume. Thisis exaggerated as p increases: the centered cylinder has much more volume than any3 r V (r; p, R)V(r; p, R) V(R, p) Rr Figure 2.

Illustration (and notation) for describing the elements used in the burning sage transformation.The 3D sphere (left) shows the diﬀerent volumes to be compared. The full sphere has volume V ( R, p ). Withina radius r the sphere contains the reduced volume V ( r ; p, R ), shown in blue, but the projected volume withina radius r in a two-dimensional plane is much larger, given by the volume of the cylinder with rounded caps, V D ( r ; p, R ), shown in red. The intersection of the plane with the sphere is illustrated in grey, and the planerepresenting the projection with both radii is shown at right. other cylinder.To reverse this eﬀect, we introduce a radial transformation that redistributes theprojected points, such that equal volume in the original ( p -dimensional) space is pro-jected onto equal area in a 2-dimensional projection. Note that this can be generalizedfor d -dimensional projections by mapping onto equal d -dimensional volume instead. Deﬁnition of the relative projected volume

To understand how the p -dimensional volume is projected onto a 2-dimensional plane,we study what fraction of the total volume is projected onto the area of a disk depend-ing on its radius. This dependence was described in Laa et al. (2020). We start froma p -dimensional hypersphere, with radius R and volume V ( R, p ), and its projectedvolume onto a centered 2-dimensional disk of radius r , V D ( r ; p, R ), where r can beany radius within [0 , R ]. The relative projected volume is then given as the ratio ofthese two quantities, v ( r ; p, R ) = V D ( r ; p, R ) V ( R, p ) = 1 − (cid:18) − (cid:16) rR (cid:17) (cid:19) p/ . (1)This ratio is of particular interest because it gives the 2-dimensional radial cumulativedistribution function (CDF) of points when assuming a uniform distribution withinthe p -dimensional hypersphere.We can compare v ( r ; p, R ) to the relative volume within a radius r in the original4 .000.250.500.751.00 0.00 0.25 0.50 0.75 1.00 r/R R e l a t i v e v o l u m e

2D projected volume r/R R e l a t i v e v o l u m e p Volume in p dimensions

Figure 3.

Comparing relative volume of a p -dimensional hypersphere captured within a radius r , in the2-dimensional projection (left) and in the p -dimensional space (right), for p = 3 , , p D sphereshrinks, as p increases, while the projected volume (near the center) grows. p -dimensional hypersphere, v p ( r ; p, R ) = V ( r, p ) V ( R, p ) = (cid:16) rR (cid:17) p . (2)Figure 3 compares these two quantities (Eq. 1 and 2), for p = 3 , , v ( r ; p, R ) and on the right is v p ( r ; p, R ). The function shapes change in oppositedirections as p increases: v ( r ; p, R ) peaks earlier and v p ( r ; p, R ) gets ﬂatter. This is theparadox of the curse of dimensionality in that projected volume at the center increaseswith p . Calculating the radial transformation

The aim of the algorithm is to redistribute the projected volume such that equalrelative areas on the disk, as given by v ( r ; p = 2 , R ) = v p ( r ; p = 2 , R ) = ( r/R ) ,contain equal relative projected volume, given by v ( r ; p, R ). This is achieved througha transformation of the projected radius that can be deﬁned for any r ∈ [0 , R ], andis applied to the projected data points in the plane, y = ( y , y ). We work with polarcoordinates and represent the data points as y = ( r y , θ y ). The angular component θ y is uniform for this distribution, by the rotation invariance of the sphere, and thus doesnot need to be transformed. The radial component r y is transformed in two steps.The ﬁrst transformation is to replace r y with v ( r y ; p, R ). Since this is the radial CDFof the assumed underlying distribution, we expect that v ( r y ; p, R ) is approximately5 .000.250.500.751.00 0.00 0.25 0.50 0.75 1.00 r y r y ¢ p Figure 4.

Relation between r y and r (cid:48) y for diﬀerent values of p and assuming R = 1. The scaling is approxi-mately linear near the center, but leads to distortion at large radii when p is large. uniformly distributed in radius. We then transform v ( r y ; p, R ) using the inverse of v ( r y ; 2 , R ), to go from a uniform distribution in radius to a uniform distribution inarea of the disk. This inverse is deﬁned via v − ( v ( r y ; 2 , R ); 2 , R ) = v ( v − ( r y ; 2 , R ); 2 , R ) = r y (3)and thus v − ( r y ; 2 , R ) = R √ r y . (4)The full radial transformation is therefore given by r (cid:48) y = v − ( v ( r y ; p, R ); 2 , R ) = R (cid:113) v ( r y ; p, R ) = R (cid:115) − (cid:18) − (cid:16) r y R (cid:17) (cid:19) p/ (5)The relation between r (cid:48) y and r y depends on the number of dimensions p , and is illus-trated for selected values in Figure 4. We see that the transformation is approximatelylinear near the center. As p increases it becomes non-linear faster, and for p = 10, forexample, the points with radius r y > . r (cid:48) y . Figure 5 demonstrates this for diﬀerent values of p by showing equidistant circles for which the radius has been transformed according toEq. 5. 6 =2 p=3 p=10 p=100−1.0 −0.5 0.0 0.5 1.0−1.0 −0.5 0.0 0.5 1.0−1.0 −0.5 0.0 0.5 1.0−1.0 −0.5 0.0 0.5 1.0−1.0−0.50.00.51.0 Figure 5.

Equidistant concentric circles, for p = 2 , , , p = 2 where no transformation is performed, and get pushed out towards theedge as p increases. Trimming and tuning

The transformation in Eq. 5 is ﬁxed for a given input dataset, by evaluating the numberof dimensions p and the maximum distance from the center R . However, in practicewe may wish to trim the projected data or tune the transformation. A combination ofboth adjustments can be used to further zoom in on the center of the distribution, oralternatively, to soften the transformation. The overall scale of the transformation is determined by R . In the case of an approx-imately spherical and uniform distribution the maximum distance from the centerworks well and ensures the validity of the rescaling in Eq. 5. But this is not robustand might result in a much larger scale than desired, especially when it is determinedby outlying observations.We therefore allow trimming of the projected observations, using R as a free pa-rameter of the display function. When selecting a value R that is smaller than themaximum distance from the center, we need to ensure that the projected radius ofpoints is always smaller than R , by trimming r y as r trim y = min( r y , R ) (6)for each observation. 7 .3.2. Tuning The dimension of the input might not reﬂect the intrinsic dimensionality of the dataset.This could be the case when dimension reduction was used prior to visualization,e.g. displaying only the ﬁrst few principal components. In this case the eﬀective di-mensionality p eﬀ is likely between the original number of dimensions and the selectednumber of principal components. We can think of omitted components as being in theorthogonal space of all considered projections, with some directions being pure noise,while others may still carry relevant information.We allow tuning p eﬀ = γp by selecting the scaling parameter γ . By default, γ = 1and p eﬀ = p . When γ < γ > p alone. Note that when p eﬀ < Implementation as a dynamic display

While the radial transformation can in general be used with any low-dimensional dis-play that suﬀers from data crowding, it is most useful when combined with a dynamicdisplay showing a sequence of interpolated low-dimensional projections obtained whenrunning a tour. We have implemented it as a new display method called display sage in the tourr package (Wickham et al. 2011) in R (R Core Team 2020).We can think of the display functions as part of a data pipeline obtained whenrunning a tour. The initial step is pre-processing the data, given by X , an n × p matrix containing n observations in p dimensions. Typically, this includes centeringand scaling, using either the overall range or the variance. Ensuring a common scaleof all variables, comparable to the selected scale parameter R , is especially importantwith the new display. The tour then iterates over the following steps:1. Obtain projection matrix A . For d -dimensional projections this is an orthonor-mal p × d matrix. To ensure the smooth rotation of projections, each new A is obtained as an interpolated step in the sequence, as explained in Buja et al.(2005). 8. Project the data by computing Y = X · A .3. Map Y to the display to re-draw the projected data. For d = 2 this typicallymaps the projected points onto a scatter plot display. With the new display weﬁrst transform Y as: • Center the 2-dimensional matrix Y and compute its polar coordinate rep-resentation ( r y , θ y ). • For each observation, ﬁrst use Eq. 6 to get the trimmed radius r trim y withinthe speciﬁed range, and then apply the radial transformation deﬁned in Eq.5 to obtain r (cid:48) y . • Use the transformed radial coordinate r (cid:48) y to re-compute the mapping ontoEuclidean coordinates ( y , y ).4. To ﬁt the ﬁnal projection onto the plotting canvas ranging between [ − , y using a scaling parameter s .The display can be added when calling the animate function in tourr , as tourr::animate(data,tour_path = tourr::grand_tour(),display = display_sage(gam, R, half_range)) and uses gam to set the γ parameter for computing p eﬀ , and the overall range R used for trimming. Both these parameters are described in Section 2.3 above. Finally, half range sets the scale parameter s to adjust the scale to the drawing canvas. Theratio R/s sets the scale for ﬁtting the displayed data on the plotting canvas, by default s = R and we apply the scaling factor 0 . R the user should take care to adjust s accordingly.

3. Applications

To illustrate the beneﬁt of using the reverse transformation for examining data usinga tour, four applications are shown: clustering of single cell RNA-seq, classifying hand-9ketched images, comparing physics experiments, and the classical pollen data. Thepollen data example is used to illustrate the eﬀect of parameter choices in the sagetour.

Clustering single-cell RNA-seq data sets

In the analysis of single cell RNA-seq data, cluster analysis is performed to detect celltypes and characterize the expression of genes that deﬁne those cell types, and therelative orientation of the cell types to each other (trajectory analysis) (Amezquitaet al. 2020). Generally, for cluster veriﬁcation, analysts use embedding methods liket-SNE to verify the placement and meaning of clusters from a clustering algorithm.An alternative is to use a tour on a small number of principal components to examinethe clusters relative to gene expression.Here we compare the sage display and regular tour display on mouse retinal singlecell RNA-seq data from Macosko et al. (2015). The raw data consists of a 49,300cells and was downloaded using the using the scRNAseq

Bioconductor package (Rissoand Cole 2019). We use a standard workﬂow for pre-processing and normalizing thisdata (described by Amezquita et al. (2020)): quality control was performed using the scater package (McCarthy et al. 2017) and scran (Lun, McCarthy, and Marioni 2016)was used to transform and normalize the expression values and select highly variablegenes (HVGs). The top ten percent of the most HVGs were used as features to subsetthe normalized expression matrix and compute the principal components. Using theﬁrst 25 PCs we built a shared nearest neighbors graph (with k = 10) and clustered thisgraph using Louvain clustering, resulting in 11 clusters being formed (Blondel et al.2008).A tour is run on the ﬁrst ﬁve PCs (approximately 20% of the variance in expression),on a weighted subsample of cells based on their cluster membership - 4,590 cells.For the sage display, we set γ = 3, ﬁxing the eﬀective dimensionality of the datato p eﬀ = 15. The PCs are scaled to have zero mean and unit variance. Here wefocus on comparing three of the clusters. In the PC plots they look very similar,begging the question whether they should be considered to be separate groups. Figure6 shows selected frames from a default tour (top row) and the sage tour (bottom10 igure 6. Selected frames from using a tour of the mouse data with the default tour display (top), and thesage display with γ = 3 (bottom). Three selected clusters are highlighted in color, all other points are shown ingrey. Using the sage display mitigates overplotting and provides a better understanding of cluster separation. row). The columns show the same projection, with the diﬀerence being that the sagetransformation is applied in the sage tour projections. The full animations are availablein the supplementary material. The static plots serve to illustrate the main points,but we encourage the reader to look at the tour animations to fully appreciate theadvantage of the sage display.Using the default tour display (Figure 6, top), the three clusters (dark green, blue,and yellow) are obscured by points in other clusters as we move through the framesof the animation. The points in the dark green cluster are overlapping those found inthe yellow and blue clusters; and it is diﬃcult to see if there is any separation betweenthe blue and yellow clusters. In contrast, the sage display (Figure 6, bottom), expandsthe center of projection, and results in the diﬀerences between the three clusters beingmore visible. Particularly, the relative positions of the yellow and blue clusters areeasier to see. While these clusters are distinct from the dark green cluster, in mostframes they are still overlapping and mixed together, providing evidence that it maybe appropriate to consider them a single cluster. Conversely, it can be seen that thedark green cluster is distinctly separated from the other two in some projections. Thesage tour makes these comparisons a little easier.11 igure 7. Selected frames of the tour run on the sketches data using the default tour display (top), andusing the sage display with γ = 2 (bottom). Three types of sketches are indicated by color: banana (green),cactus (orange) and crab (purple). Overplotting of points is a problem for the grand tour display, while thesage display reveals low density near the center. Classifying hand-sketches

We next use the new display to look at diﬀerent distributions of images from theGoogle QuickDraw collection (Google, Inc 2020). These are 28 ×

28 = 784 pixel greyscale data that are available publicly. In this example, we sample 1000 images fromthree types of sketches (banana, cactus, crab) and see if we can separate the classesin the high-dimensional parameter space.We reduce the dimensionality from 784 variables to the ﬁrst 5 PCs, which capturesapproximately 20 per cent of the variation of the data. Before applying the tour werescale each component to have mean zero and unit variance. To account for thedimension reduction before visualization we set γ = 2 for the sage display.Figure 7 shows the grand tour on the PCs, where green points correspond to thebanana class, orange points represent the cactus class and purple points are the crabclass. In the selected frames of both displays points belonging to the cactus class areconcentrated near the center, however on the default display (Figure 7, top) thereis overplotting: points from other classes overlap those in the cactus class. The sagedisplay (Figure 7, bottom) helps reduce overplotting, it is easier to see that the centersof class are separated and that there is substructure in the banana class, which furthercollapses into two subgroups. 12 igure 8. Selected frames of the tour of the pdfsense data using the default tour display (top), and using thesage display with R = 10 (bottom). Diﬀerent underlying physical processes are shown by color and we can seeorthogonality between the three groups. The sage display preserves the overall structure while revealing detailsthat are hidden near the center in the default display. The animated sage tour available in the supplementary material further reveals alow density of points near the center of the distribution: observing the movement ofpoints when rotating the viewing angle shows that even the cactus class is clusteringaway from the mean.

Comparing physics experiments: PDFSense

Data were obtained from CT14HERA2 parton distribution function ﬁts and describethe sensitivity of ﬁt parameters to experimental measurements (Wang et al. 2018).There are 28 parameters, and varying one at a time to move ± σ away from the ‘bestﬁt point’ (maximum likelihood estimate) provides our input variables, labelled X1-X56. Each of the 2808 observations corresponds to a physical observable and measureshow the ﬁt prediction changes along the 56 directions in parameter space. Pointsare grouped based on the underlying process in the experiment, which is mappedto color in the following. With the analysis of the distribution along these variablesX1-X56 we can understand to what extent each experimental measurement providesnew information for the global ﬁt. For example, orthogonality between groups markscomplementary constraints, and outlying points are considered as important for futureﬁts, see discussion in Cook, Laa, and Valencia (2018).13ollowing the processing described there, we tour the ﬁrst 6 PCs, rescaled to havezero mean and unit variance. In Figure 8 we see that the sage display with R = 10(Figure 8, bottom), maintains the overall shape of the data seen using the default tourdisplay (Figure 8, top). The diﬀerent physical process, shown in diﬀerent colors, areindeed orthogonal in the parameter space, as can be seen most clearly by looking atthe animations available in the supplementary material.The particular structure of this distribution, with some clusters extending linearlyaway from the center and a set of outlying points, results in poor use of the plottingspace, and high level of clustering near the center. For example, focusing on the bluecluster, we can see that it extends out along diﬀerent directions, but it can be challeng-ing to observe how the points move under the tour rotation, as overplotting becomesan issue when points move through the center. Here, the new display (bottom row)shows a clearer view. Tuning the parameters: Pollen

The classical pollen data is useful to demonstrate the trimming and tuning parameters.The ﬁve-dimensional data set was simulated by David Coleman of RCA Labs, forthe Joint Statistics Meetings 1986 Data Expo (Coleman 1986), and is an example ofa hidden structure near the center of a distribution. The data are standardized bycentering and scaling such that the standard deviation of each variable is equal to one.Neither the standard tour display nor the sage display with default settings ( R = 6 . γ = 1) reveals the structure (left plot in Figure9). We can use either γ , R or a combination of the two to zoom in further near thecenter. For example, we can use trimming ( R = 1 , γ = 1) (middle plot) or tuning ( R =6 . , γ = 20) (right plot) as shown in Figure 9. There is an approximate equivalencebetween the results obtained using either tuning or trimming, and both views clearlyreveal the word “EUREKA” hidden in the distribution.While the static views look very similar, comparing the tour animations (availablein the supplementary material) reveals some diﬀerences between the display with trim-ming or tuning. When trimming (by setting R = 1) the focus is clearly on the centerof the distribution, and most points get pushed out towards a maximum radius circle.14 igure 9. Selected views of the pollen data in the new sage display, with default settings (left), setting R = 1 (middle) and γ = 20 (right). We can tune either γ , R or a combination of the two to reveal the word”EUREKA” near the center of the distribution. On the other hand, tuning the display by setting γ = 20 preserves the elliptical shapeof the distribution, making it easier to see correlation patterns.

4. Discussion

This paper has introduced the sage tour, which reduces the data crowding eﬀects thatoccur when taking low-dimensional projections of high-dimensional data. This newtechnique is easily incorporated into exploratory high-dimensional data analysis, andapplications shown in Section 3 provide examples of the following tasks: • clustering: the sage display uncovered clusters that were originally obscured bydata piling, while still giving the viewer an accurate assessment of the size ofa cluster, and their relative orientation, as shown in the single cell RNA-seqexample (Section 3.1) • classiﬁcation: the sage display decreases the number of overlapping points be-tween classes and provides better visual separation between classes compared tothe regular tour, as shown in the sketches example (Section 3.2) • shape analysis: the sage display helps us understand structures across multipledimensions, for example orthogonality between multiple groups, as shown in thepdfsense example (Section 3.3) • needle discovery: the sage display allows to ﬁnd hidden signal that is concealedby the density of points around the center of the projection, as shown in thepollen example (Section 3.4) 15he approach provides interpretable visualization that captures high dimensionalinformation and preserves global structure, and it is complementary to non-linear di-mension reduction techniques. For example, when visualizing clusters, the sage displayenables an assessment of cluster shapes, and accurately captures relative position andorientation. The burning sage transformation is global and does not magnify localstructure like t-SNE does.An alternative is the slice tour (Laa, Cook, and Valencia 2020) which allows distri-butions of points around the center of the data to be explored using sections insteadof projections. The slice tour is useful when there are large numbers of observationsor if there is concave structure in the data.The tuning parameters can be used to more aggressively expand the center of thedisplay. All of the examples shown had some tuning. The last example demonstratedhow points away from the projected center get moved to the edge of the hypersphereas γ is increased or R is decreased. With more center magniﬁcation, the non-lineartransformation can introduce distortions, but this is a well known problem for anynon-linear dimension reduction technique including t-SNE. However, unlike t-SNE,any distortion introduced by the sage display is interpretable because it is controlledby a simple function (Eq. 5).The sage display is fast to compute, which lends itself to being embedded into aninteractive interface. An ideal interface would allow real time changes to the parame-ters of the transformation. This would be especially useful when coupled with linkedbrushing in complementary views. Acknowledgements

The authors gratefully acknowledge the support of the Australian Research Council.The paper was written in rmarkdown (Xie, Allaire, and Grolemund 2018) using knitr (Xie 2015). 16 upplementary material

The source material for this paper is available at https://github.com/uschiLaa/burning-sage . The animated gifs for all applications are also included in html ﬁles inthe supplementary material.

References

Ahn, Jeongyoun, and J. S. Marron. 2010. “The maximal data piling direction for discrimina-tion.”

Biometrika

97 (1): 254–259. https://doi.org/10.1093/biomet/asp084 .Amezquita, Robert A, Aaron T L Lun, Etienne Becht, Vince J Carey, Lindsay N Carpp, LudwigGeistlinger, Federico Marini, et al. 2020. “Orchestrating single-cell analysis with Bioconduc-tor.”

Nat. Methods

17 (2): 137–145. http://dx.doi.org/10.1038/s41592-019-0654-x .Asimov, D. 1985. “The Grand Tour: A Tool for Viewing Multidimensional Data.”

SIAMJournal of Scientiﬁc and Statistical Computing

Adaptive control processes : a guided tour . Princeton Legacy Library.Blondel, Vincent D, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008.“Fast unfolding of communities in large networks.”

J. Stat. Mech. https://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008/meta .Buja, Andreas, Dianne Cook, Daniel Asimov, and Catherine Hurley. 2005. “14 - ComputationalMethods for High-Dimensional Rotations in Data Visualization.” In

Data Mining and DataVisualization , edited by C.R. Rao, E.J. Wegman, and J.L. Solka, Vol. 24 of

Handbook ofStatistics , 391 – 413. Elsevier. .Coleman, David. 1986. “Geometric Features of Pollen Grains.” http://lib.stat.cmu.edu/data-expo/ .Cook, Dianne, Ursula Laa, and German Valencia. 2018. “Dynamical projections for the visu-alization of PDFSense data.”

Eur. Phys. J. C

78 (9): 742.Diaconis, Persi, and David Freedman. 1984. “Asymptotics of Graphical Projection Pursuit.”

Ann. Statist.

12 (3): 793–815. https://doi.org/10.1214/aos/1176346703 .Donoho, David L. 2000. “High-dimensional data analysis: The curses and blessings of dimen-sionality.” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.3392. Accessed:2020-09-15.Google, Inc. 2020. “Quick, Draw! The Data.” https://quickdraw.withgoogle.com/data . ccessed: 2020-4-23, https://quickdraw.withgoogle.com/data .Hall, Peter, J. S. Marron, and Amnon Neeman. 2005. “Geometric representation of high di-mension, low sample size data.” Journal of the Royal Statistical Society: Series B (Statisti-cal Methodology)

67 (3): 427–444. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2005.00510.x .Laa, Ursula, Dianne Cook, Andreas Buja, and German Valencia. 2020. “Hole or grain? ASection Pursuit Index for Finding Hidden Structure in Multiple Dimensions.” .Laa, Ursula, Dianne Cook, and German Valencia. 2020. “A slice tour for ﬁnding hollownessin high-dimensional data.”

Journal of Computational and Graphical Statistics https://doi.org/10.1080/10618600.2020.1777140 .Lun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A step-by-step workﬂowfor low-level analysis of single-cell RNA-seq data with Bioconductor.”

F1000Res.

5: 2122.Macosko, Evan Z, Anindita Basu, Rahul Satija, James Nemesh, Karthik Shekhar, MelissaGoldman, Itay Tirosh, et al. 2015. “Highly Parallel Genome-wide Expression Proﬁling ofIndividual Cells Using Nanoliter Droplets.”

Cell

161 (5): 1202–1214. http://dx.doi.org/10.1016/j.cell.2015.05.002 .Marron, J. S., Michael J. Todd, and Jeongyoun Ahn. 2007. “Distance-Weighted Discrim-ination.”

Journal of the American Statistical Association

102 (480): 1267–1271. .McCarthy, Davis J., Kieran R. Campbell, Aaron T. L. Lun, and Quin F. Willis. 2017. “Scater:pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq datain R.”

Bioinformatics

33: 1179–1186.R Core Team. 2020.

R: A Language and Environment for Statistical Computing . Vienna,Austria: R Foundation for Statistical Computing. .Risso, Davide, and Michael Cole. 2019. scRNAseq: Collection of Public Single-Cell RNA-SeqDatasets . R package version 2.0.2.van der Maaten, L., and G. Hinton. 2008. “Visualizing Data using t-SNE.”

Journal of MachineLearning Research

9: 2579–2605.Wang, Bo-Ting, T.J. Hobbs, Sean Doyle, Jun Gao, Tie-Jiun Hou, Pavel M. Nadolsky, andFredrick I. Olness. 2018. “Mapping the sensitivity of hadronic experiments to nucleon struc-ture.”

Phys. Rev. D

98 (9): 094030.Wickham, Hadley, Dianne Cook, Heike Hofmann, and Andreas Buja. 2011. “tourr: An RPackage for Exploring Multivariate Data with Projections.”

Journal of Statistical Software .Xie, Yihui. 2015. Dynamic Documents with R and knitr . 2nd ed. Boca Raton, Florida: Chap-man and Hall/CRC. https://yihui.name/knitr/ .Xie, Yihui, Joseph J. Allaire, and Garrett Grolemund. 2018.

R Markdown: The DeﬁnitiveGuide . Chapman and Hall/CRC. https://bookdown.org/yihui/rmarkdown ..