[PDF] Local biplots for multi-dimensional scaling, with application to the microbiome

Abstract

We present local biplots, a an extension of the classic principal components biplot to multi-dimensional scaling. Noticing that principal components biplots have an interpretation as the Jacobian of a map from data space to the principal subspace, we define local biplots as the Jacobian of the analogous map for multi-dimensional scaling. In the process, we show a close relationship between our local biplot axes, generalized Euclidean distances, and generalized principal components. In simulations and real data we show how local biplots can shed light on what variables or combinations of variables are important for the low-dimensional embedding provided by multi-dimensional scaling. They give particular insight into a class of phylogenetically-informed distances commonly used in the analysis of microbiome data, showing that different variants of these distances can be interpreted as implicitly smoothing the data along the phylogenetic tree and that the extent of this smoothing is variable.

Full PDF

LLocal biplots for multi-dimensional scaling, withapplication to the microbiome

Julia FukuyamaDepartment of Statistics, Indiana University BloomingtonAugust , Abstract

We present local biplots, a an extension of the classic principal components biplot tomulti-dimensional scaling. Noticing that principal components biplots have an interpreta-tion as the Jacobian of a map from data space to the principal subspace, we deﬁne localbiplots as the Jacobian of the analogous map for multi-dimensional scaling. In the process,we show a close relationship between our local biplot axes, generalized Euclidean distances,and generalized principal components. In simulations and real data we show how localbiplots can shed light on what variables or combinations of variables are important for thelow-dimensional embedding provided by multi-dimensional scaling. They give particularinsight into a class of phylogenetically-informed distances commonly used in the analysis ofmicrobiome data, showing that different variants of these distances can be interpreted as im-plicitly smoothing the data along the phylogenetic tree and that the extent of this smoothingis variable.

Keywords: generalized eigendecomposition, dimension reduction, microbiome, phylogeny, dis-tance, high-dimensional data a r X i v : . [ s t a t . M E ] A ug . I ntroduction Exploratory analysis of a high-dimensional dataset is often performed by deﬁning a distancebetween the samples and using some form of multi-dimensional scaling (MDS) to obtain a low-dimensional representation of those samples. The resulting representation of the data can thenbe used for visualization and hypothesis generation, and the distances themselves can be usedfor hypothesis testing or other downstream analyses. However, all forms of multi-dimensionalscaling have the limitation that they only provide information about the relationships betweenthe samples. The analyst would often like to know about the relationships between samplesand variables as well: if we see gradients or clusters in the samples, what variables are theyassociated with? Do particular regions in the map correspond to particularly high or lowvalues of certain variables? If principal components analysis (PCA) or a related method isused, these questions are answered naturally with the PCA biplot (Gabriel ), but multi-dimensional scaling is silent on these questions.Related to the question of interpreting a single multi-dimensional scaling plot is the ques-tion of interpreting differences between multi-dimensional scaling plots that use different dis-tances to represent the relationships among the same set of samples. This issue can arise inany problem for which multiple distances are available for the same task, but the work here ismotivated by analysis of microbiome data. In the microbiome ﬁeld, many distances are avail-able, and multiple distances are often used on the same dataset. For instance, it is commonto pair a distance that uses only presence/absence information with one that uses abundanceinformation and to attribute differences in the representations to the inﬂuence of rare species(as in for example Lozupone et al. ( )). However, it is often unclear whether such conclu-sions are warranted, and an analog of the principal components biplot to multi-dimensionalscaling would give us much more insight into the differences between the representations ofthe data given by multi-dimensional scaling with different distances.Other authors have done work on this problem, and the local biplots deﬁned here aremost directly related to the non-linear biplots described in Gower & Harding ( ). Thesebiplots are based on the idea that for any multi-dimensional scaling plot, we can deﬁne afunction that takes a new data point and adds it to the multi-dimensional scaling embedding pace. Other suggestions for biplots for multi-dimensional scaling have tended to generalizethe SVD interpretation of PCA, including Satten et al. ( ) and Wang et al. ( ). Anothersimple and commonly-used method for visualizing the relationship between the variables andthe embedding space is simply to compute correlations between variables and the samplescores on the embedding axes (McCune et al. ; Daunis-i Estadella et al. ). All ofthese methods have their merits, but we believe that the local biplots deﬁned here represent anatural and as yet unexploited way of visualizing the relationship between the data space andthe embedding space.In this paper, we present an interpretation of the PCA biplot axes as the Jacobian of a mapfrom data space to embedding space. We then generalize this interpretation of the biplot axes,showing how we can obtain analogous axes in classical multi-dimensional scaling. The result-ing visualization tool has a natural interpretation in terms of sensitivities of the embeddings tovalues of the input variables, gives the analyst a picture of the relationships between the vari-ables and samples and to different parts of the plot, and suggests further diagnostic tools forinvestigating the importance and linearity of the relationship between the data space and theembedding space deﬁned by classical scaling. Along the way, we generalize the classic resultabout the relationship between MDS with the Euclidean distance and principal componentsto MDS with a generalized Euclidean distance and generalized principal components. . N otation We will use the following notation:– Upper-case bold symbols (e.g. A ) represent matrices.– Lower-case bold symbols (e.g. x ) represent vectors.– Lower-case italic symbols represent the elements of a matrix or a vector ( a ij is the scalarin the i th row, j th column of the matrix A , x i is the i th element of the vector x ).– e j is a vector with a in position j and ’s in all other positions.– n is an n -vector ﬁlled with ’s. n is an n -vector ﬁlled with ’s.– I n ∈ R n × n is the identity matrix.– A (cid:31) A is symmetric positive deﬁnite.– If A ∈ R n × p , A i · = (cid:16) a i · · · a ip (cid:17) , and A · j =  a j ... a nj  .– A · , k : l indicates columns k through l of the matrix A .– A k : l , m : n indicates the submatrix of A consisting of the k through l th rows and the m ththrough n th columns.– X will always denote a data matrix, with samples as rows and variables as columns. Wewill use x i to denote the column vector X Ti · , that is, the vector corresponding to sample i .– C n = I n − n Tn / n is the centering matrix.– d : R p × R p → R + will be used to denote an arbitrary distance function.– d y : R p → R + , where y ∈ R p will denote the restriction of d to { y } × R p .– d A : R p × R p → R + where 0 ≺ A ∈ R p × p will be a generalized Euclidean distance ,with d A : x , y (cid:55)→ (cid:112) ( x − y ) T A ( x − y ) . Note that this is more commonly refered to asthe Mahalanobis distance (Mahalanobis ). We use “generalized Euclidean distance”instead because we do not wish to emphasize the probabilistic interpretation.– If f : R m → R n and z ∈ R n , the Jacobian matrix J f ( z ) ∈ R n × m has elements ( J f ( z )) ij = ∂ f i ∂ z j ( z ) . . M ethods . Principal components biplots

Before introducing local biplots for multi-dimensional scaling, we review principal compo-nents biplots. There are several different types of PCA biplots, and we focus here on the form iplot. Suppose we have a data matrix X ∈ R n × p with centered columns. Let X = UDV T be the singular value decomposition of X , so that U ∈ R n × rank ( X ) , V ∈ R n × rank ( X ) , D ∈ R rank ( X ) × rank ( X ) , U T U = V T V = I rank(X) , and D is diagonal with d ii ≥ d jj for i < j .In the form biplot, we display the rows of U · ,1: k D k ,1: k and the rows of V · ,1: k , usually with k = i th row of U · ,1:2 D along the x - and y -axes gives the projection ofthe i th row of X onto the ﬁrst and second principal axes, respectively.– The position of the j th row of V · ,1:2 along the x - and y -axes gives the loading of the j thvariable onto the ﬁrst and second principal axes, respectively.– If ˆ X denotes the least-squares optimal rank- approximation of X , then ˆ x ij = U · ,1:2 D V T · ,1:2 ( ˆ x ij is the inner product between the i th row of U · ,1:2 D and the j th row of V · ,1:2 ).The last point in the list above provides the standard motivation behind biplots: if the matrix X is well approximated by a rank- matrix, then we can read off good approximations ofelements of X from the biplot. Even if X is not well approximated by a rank- matrix, thebiplot still allows us to read off exactly (to the extent that we can compute inner products bylooking at a plot) its rank- approximation.To this list we add one more property. Let g : R p → R k be the projection of a new datapoint onto the k -dimensional principal subspace, deﬁned as z (cid:55)→ ( V k ) T z . Let J g be theJacobian of that map. J g ( z ) describes the sensitivity of the projection of a new point z on theprincipal subspace to perturbations in the variables. Then:– For principal components, J g ( z ) = J g = ( V · ,1: k ) T (the map is linear and so the Jacobianis constant). The j th column of J g gives the j th biplot axis in the principal componentsbiplot.This property is close to simply being the sensitivity of the embedding position of a row of X to a perturbation in the j th variable. The difference is that we ﬁx the principal axes whenperturbing one of the variables. In a more standard sensitivity formulation, perturbing one f the variables associated with one of the samples would change the principal subspace. Bythinking of the principal subspace as being ﬁxed and deﬁning a map that takes new datapoints into that space, we can describe what sorts of perturbations of supplemental points inthe embedding space would result from perturbations of those points in the input data space.It is this last property that we propose to generalize to multi-dimensional scaling. Sincemulti-dimensional scaling is not a linear map from the data space to the embedding space,most of the biplot interpretations are not available, or they are only available in approxi-mate forms. The linearity of principal components corresponds to one set of sensitivities toperturbations of the input variables, no matter what the sample being perturbed is. Sincemulti-dimensional scaling is non-linear, we will have different sensitivities to perturbationsof the input variables in different parts of the data space, which will correspond to differentsets of biplot axes in different parts of the space. This complicates the representation, but it isunavoidable if we do not want to approximate a nonlinear technique by a linear one. . Multi-dimensional scaling deﬁnes a map from data space to embeddingspace

To generalize the Jacobian interpretation of the PCA biplot to classical multi-dimensional scal-ing, we need to deﬁne a function associated with a multi-dimensional scaling solution thatmaps from data space to embedding space. We will assume that we have a matrix X ∈ R n × p ,whose i th row is denoted x i and a distance function d : R p × R p → R + . Recall that in classi-cal multi-dimensional scaling, as described in Gower ( ) and Torgerson ( ), we start bydeﬁning ∆ ∈ R n × n as δ ij = d ( x i , x j ) ( )We then let − C n ∆ C Tn = B Λ B T ( ) e the eigendecomposition of − C n ∆ C Tn , i.e., B ∈ R n × n is an orthogonal matrix, Λ ∈ R n × n is diagonal such that λ ii ≥ λ jj if i > j . If the distances are Euclidean embeddable, all thediagonal elements of Λ will be non-negative, and the Euclidean distance between B i · Λ and B j · Λ will be equal to d ( x i , x j ) . In classical scaling, for a k -dimensional representation of thesamples we use the ﬁrst k columns of B (those corresponding to the largest eigenvalues) scaledby the appropriate eigenvalue: M = B · ,1: k Λ k ,1: k ( )Although classical multi-dimensional scaling is usually understood as simply giving arepresentation of the samples in the embedding space, we can deﬁne the necessary functionas the one that maps supplemental points (i.e., points that were not used to create the MDSsolution) to the embedding space. To create such a function, we consider the problem ofadding a point to the embedding space so as to have the distances in the embedding spacematch the distances deﬁned by d . The solution to this problem is derived in Gower ( ). Let f : R p → R k be the function that takes a new point and maps it to the embedding space. If z ∈ R p , then f : z (cid:55)→ Λ − k ,1: k M T a . ( )where a ∈ R n has elements a i = (cid:18) − C n ∆ C Tn (cid:19) ii − d ( x i , z ) . ( )Note that adding a supplemental point in this way is not the same as making a new em-bedding based on n + j th variable is the j th column of the Jacobian of a map taking a new pointto the principal subspace, assuming that the principal subspace is ﬁxed. . Local biplot axes for differentiable distances

Now that we have a function associated with a multi-dimensional scaling solution that mapsfrom data space to embedding space, we are ready to generalize PCA biplots to multi-dimensionalscaling. Recall that the PCA biplot axis for the j th variable was given by the j th column of J g , where g was the map from data space to embedding space in PCA. For multi-dimensionalscaling, if f is the map deﬁned in ( ), J f ( z ) is a function of z . Therefore, we cannot have justone set of biplot axes describing the entire plot. There will instead be one set of biplot axes foreach point in the space of the original variables.We make the following deﬁnition: Deﬁnition . . Local biplot axes.Let X ∈ R n × p be a data matrix, and recall that x i , i =

1, . . . , n are column vectors corre-sponding to the rows of X . Let d : R p × R p → R + be a distance function, and let f be the func-tion, deﬁned in , that maps supplemental points to the embedding space deﬁned by classicalscaling on ( X , d ) . Denote by d y the restriction of d to { y } × R p , so that d y : { y } × R p → R + is deﬁned by x (cid:55)→ d ( y , x ) , and suppose that d x i , i =

1, . . . , n has partial derivatives. The localbiplot axes for z ∈ R p are given by LB ( z ) = ( J f ( z )) T ( ) =  ∂ d x ( z ) ∂ z · · · ∂ d x n ( z ) ∂ z ... ... ∂ d x ( z ) ∂ z p · · · ∂ d x n ( z ) ∂ z p  M Λ − k ,1: k ( ) =  d x ( z ) ∂ d x ( z ) ∂ z · · · d x n ( z ) ∂ d x n ( z ) ∂ z ... ... d x ( z ) ∂ d x ( z ) ∂ z p · · · d x n ( z ) ∂ d x n ( z ) ∂ z p  M Λ − k ,1: k ( ) =  ∂ d x ( z ) ∂ z · · · ∂ d x n ( z ) ∂ z ... ... ∂ d x ( z ) ∂ z p · · · ∂ d x n ( z ) ∂ z p  diag (( d x ( z ) , . . . , d x n ( z ))) M Λ − k ,1: k ( ) here M and Λ are as deﬁned in equations - . The j th row of LB ( z ) is the local biplot axisfor the j th variable.We deﬁne LB ( z ) as the transpose of the Jacobian matrix so that LB ( z ) is analogous to thematrix V · ,1: k used in the deﬁnition of PCA biplots earlier in this section. We of course couldhave deﬁned it the other way and taken the local biplot axis for the j th variable to be the j thcolumn of the Jacobian matrix. . Local biplot axes for non-smooth distances

Although the deﬁnition above relies on the existence of the Jacobian and therefore requires thedistance to be differentiable, we can modify our deﬁnition of local biplot axes to accommodatenon-differentiable or discontinuous distances. If the issue with the distance is that the left andright limits in the deﬁnition of the derivative exist but do not match, we use either the left orthe right limit and call them negative or positive local biplot axes. If the issue is that the limitsdiverge, as they might with a discontinuous distance, we use a discrete approximation of thederivative with discretization ε that the user speciﬁes and call them the ε -negative or ε -positivelocal biplot axes. Deﬁnition . . Local biplot axes for discontinuous distances. Let X ∈ R n × p be a data matrixwhose rows are denoted x i , d : R p × R p → R + be a distance function, and let f be thefunction, deﬁned in , that maps supplemental points to the embedding space deﬁned byclassical scaling on ( X , d ) , and let ε ∈ R + . Denote by d y the restriction of d to { y } × R p , sothat d y ( x ) = d ( y , x ) . The ε -positive local biplot axes for z ∈ R p are given by LB ε , + ( z ) = F ε , + ( z ) diag (( d x ( z ) , . . . , d x n ( z ))) M Λ − k ,1: k ( )where F ε , + ( z ) ∈ R p × n , with ( F ε , + ( z )) ji = d x i ( z + ε e j ) − d x i ( z ) ε ( ) nalogously, still taking ε >

0, the ε -negative local biplot axes for z ∈ R p are given by LB ε , − ( z ) = F ε , − ( z ) diag (( d x ( z ) , . . . , d x n ( z ))) M Λ − k ,1: k ( )where F ε , + ( z ) ∈ R p × n , with ( F ε , − ( z )) ji = d x i ( z ) − d x i ( z − ε e j ) ε ( )Although at ﬁrst read, the ε might seem to be an unpleasant hack, it has a reasonableinterpretation. In many situations there is a minimum amount by which a variable can beperturbed. For instance, with count-valued data, a variable can be perturbed by no less than .The interpretation of the local biplot axes in that case is that they describe how a supplementalpoint would react to perturbation of a variable by the minimum unit.Note that by deﬁnition, if we use ( ) or ( ) on a differentiable distance, we will havelim ε ↓ LB ε , + ( z ) = lim ε ↓ LB ε , − ( z ) = LB ( z ) and so the two deﬁnitions are consistent.Finally, we can have distances for which the relevant derivatives do not exist because al-though the left and right limits in the deﬁnition of the derivative exist, they do not match. Oneexample is the Manhattan distance d M ( x , y ) = ∑ pj = | x j − y j | . In that case, we can make thedeﬁnition: Deﬁnition . . Local biplot axes for continuous non-differentiable distances. Let X ∈ R n × p bea data matrix whose rows are denoted x i , d : R p × R p → R + be a distance function, and let f be the function, deﬁned in , that maps supplemental points to the embedding space deﬁnedby classical scaling on ( X , d ) , and let δ ∈ R + . Denote by d y the restriction of d to { y } × R p , sothat d y ( x ) = d ( y , x ) . Suppose that lim ε ↓ d x i ( z + ε e j ) − d x i ( z ) ε and lim ε ↓ d x i ( z ) − d x i ( z − ε e j ) ε exist for all i =

1, . . . , n , j =

1, . . . , p , z ∈ R p .The positive local biplot axes for z ∈ R p are given by LB + ( z ) = lim ε ↓ F ε , + ( z ) diag (( d x ( z ) , . . . , d x n ( z ))) M Λ − k ,1: k ( ) here F ε , + ( z ) ∈ R p × n , with ( F ε , + ( z )) ji = d x i ( z + ε e j ) − d x i ( z ) ε ( )Analogously, the negative local biplot axes for z ∈ R p are given by LB − ( z ) = lim ε ↓ F ε , − ( z ) diag (( d x ( z ) , . . . , d x n ( z ))) M Λ − k ,1: k ( )where F ε , + ( z ) ∈ R p × n , with ( F ε , − ( z )) ji = d x i ( z ) − d x i ( z − ε e j ) ε ( )As with the deﬁnition for discontinuous distances, if we apply this deﬁnition to a distance thatis differentiable, we will have LB + ( z ) = LB − ( z ) = LB ( z ) . . Properties of local biplot axes

We next show some simple properties of local biplot axes that illustrate how they are relatedto other methods and how they allow us to interpret the MDS embedding space. Proofs areprovided in the Appendix. . . Equivalence of PCA biplot axes and local biplot axes for Euclidean distances

The ﬁrst property of local biplot axes has to do with the relationship between the principalcomponents of X and the local biplot axes that result when we perform MDS on X with theEuclidean distance. Theorem . . Equivalence of PCA axes and local biplot axes. Let X ∈ R n × p have centeredcolumns, d : R p × R p → R be the Euclidean distance function, d ( x , y ) = (cid:104) ∑ pi = ( x i − y i ) (cid:105) .Let the singular value decomposition of X be X = U Λ V T , where U ∈ R n × k , Λ ∈ R k × k , V ∈ R p × k , U T U = V T V = I k . Then for any z ∈ R p , the local biplot axes for classical multi-dimensional scaling of X with distance d at z are given by LB ( z ) = V · ,1: k .This theorem tells us that if we perform MDS with the Euclidean distance, the local biplot xes are constant and they are the same as the principal axes. It can be thought of as dualto the classic result about the relationship between PCA and multi-dimensional scaling withthe Euclidean distance in Gower ( ): the result above is about the variables, and the classicresult is about the samples. . . Relationship between gPCA axes and local biplot axes for generalized Euclideandistances

The relationship between local biplot axes and principal components is not restricted to thestandard Euclidean distance. For any generalized Euclidean distance d Q ( x , y ) = (cid:112) ( x − y ) T Q ( x − y ) ,with Q (cid:31)

0, the local biplot axes will be constant on the data space and will be related to thegeneralized eigendecomposition or generalized principal components analysis (gPCA) of X .The interested reader can see Holmes ( ) for a review. Brieﬂy, gPCA is deﬁned on a triple ( X , Q , D ) , where X ∈ R n × p , 0 ≺ Q ∈ R p × p , 0 ≺ D ∈ R n × n . The generalization in gPCA can beinterpreted either as generalizing the noise model in the probabilistic formulation of PCA toone in which the errors have a matrix normal distribution with covariance D − ⊗ Q − (Allenet al. ), or generalizing from standard inner product spaces on the rows and columns of X to inner product spaces deﬁned by Q and D (Holmes ).The local biplot axes for MDS with d Q are related to the generalized principal axes for thetriple ( X , Q , I n ) . The deﬁnition of generalized principal axes is as follows: Deﬁnition . . Generalized principal axes. Consider gPCA of the triple ( X , Q , D ) , with X ∈ R n × p , 0 ≺ Q ∈ R p × p , 0 ≺ D ∈ R n × n . Let V ∈ R p × k and Λ ∈ R k × k be matrices satisfying X T DXQV = V Λ , V T QV = I ( )with λ ii ≥ λ jj iff i < j . The generalized principal axes for gPCA on the triple ( X , Q , D ) are givenby V Λ , and the normalized generalized principal axes are simply V .The generalized principal axes/generalized eigenvectors exist and can be understood interms of the more familiar standard eigendecomposition of XQX T as follows: Theorem . . If the eigendecomposition of

XQX T is ˜V ˜ Λ ˜V T , then V = Q − ˜V and Λ = ˜ Λ satisfy the equations ( ). nce we have deﬁned generalized eigenvectors/generalized principal axes, we can writedown the relationship between the generalized principal axes and the local biplot axes for ageneralized Euclidean distance: Theorem . . Local biplot axes for generalized Euclidean distances. Let X ∈ R n × p , Q ∈ R p × p , Q (cid:31) d Q : R p × R p → R be a generalized Euclidean distance so that d Q ( x , y ) = [( x − y ) T Q ( x − y )] . Let V be the normalized principal axes for gPCA on the triple ( X , Q , I n ) .Then the local biplot axes for classical multi-dimensional scaling of X with the distance d Q at z are given by LB ( z ) = QV .Note that standard PCA is gPCA on ( X , I p , I n ) , and so Theorem . is a special case ofTheorem . . . . Conditions under which constant local biplot axes imply generalized Euclidean dis-tance

After seeing that generalized Euclidean distances imply constant local biplot axes, a naturalquestion is whether the converse holds: if the local biplot axes are constant, must the dis-tance have been a generalized Euclidean distance? The answer in general is no: given that LB ( z ) = L for every z ∈ R p , and assuming further that adding z does not require an ad-ditional embedding dimension, we still only have that d ( x i , z ) = d LL T ( x i , z ) for i =

1, . . . , n , z ∈ R p . Theorem . . Constant local biplot axes imply d is a generalized Euclidean distance on { x i } i = n × R p . Let X ∈ R n × p , with n > p . Suppose the local biplot axes associated with the multi-dimensional scaling solution of X with the distance d are LB ( z ) = L , with L ∈ R p × p withrank ( L ) = p for any z ∈ R p . Suppose that for multi-dimensional scaling of X with d , the fullembedding space is of dimension p , and further suppose that any supplemental points z ∈ R p can be added to the space without requiring an additional embedding dimension. Then forany z ∈ R p and any x i , i =

1, . . . , n we have d ( x i , z ) = d LL T ( x i , z ) .The limitation of the theorem above is that it does not apply to arbitrary pairs of pointsin the embedding space, only to pairs for which one member is one of the initial data points. o see why constant local biplot axes are not enough to ensure that for any arbitrary points z , z ∈ R p , that d ( z , z ) = d LL T ( z , z ) , consider the distance d ET ( x , y ) =  ( x − y ) min ( | x | , | y | ) ≤ ( x − y ) min ( | x | , | y | ) > )The mnemonic is that ET stands for “express train”: if we think of the distance as being timealong a rail route, trips from anywhere to any point in [ −

1, 1 ] take time equal to distance,while trips between two points outside of [ −

1, 1 ] take time equal to half the distance.If we imagine performing multi-dimensional scaling with d ET and X =  .5 − .5  , the MDSsolution would give us a one-dimensional space with x embedded at .5 and x embedded at − .5 (or vice versa). Additionally, the local biplot axis (axis singular because there is only onevariable in this example) would be constant and equal to . This is because the local biplot axesare deﬁned by d x i ( z ) , and since both x and x have absolute value less than , the distanceto be evaluated in the deﬁnition of LB axes is always the standard Euclidean distance. But ofcourse d ET is not the standard Euclidean distance and is indeed not a generalized Euclideandistance at all, and so constant local biplot axes cannot imply that the distance used for MDSwas a generalized Euclidean distance.However, if we add the assumption that d is homogeneous and translation invariant, con-stant local biplot axes do imply that d is a generalized Euclidean distance on R p × R p , notjust on { x i } i = n × R p . Note that the assumption is not the same as restricting to the setof generalized Euclidean distances, as there any many distances that satisfy those propertiesthat are not generalized Euclidean distances, e.g. the Minkowski distances. In addition, notethat the homogeneity and translation invariance assumptions do not rule out any generalizedEuclidean distance as all generalized Euclidean distances are homogeneous and translationinvariant. Theorem . . Constant local biplot axes and homogeneous, translation-invariant d imply d is a generalized Euclidean distance on R p × R p . Suppose that n ≥ p , X ∈ R n × p satisﬁesrank ( X ) = p , and that the local biplot axes associated with the multi-dimensional scalingsolution of X with the distance d are LB ( z ) = L for L ∈ R p × p with rank ( L ) = p for any ∈ R p . As in Theorem . , assume that the full-dimensional MDS embedding space hasdimension p , and that for any z ∈ R p , z can be added to the initial embedding space withoutrequiring the addition of any extra dimensions.Suppose further that d satisﬁes translation invariance and homogeneity, so that for any α ∈ R x , y , z ∈ R p , d ( α x , α y ) = | α | d ( x , y ) and d ( x + z , y + z ) = d ( x , y ) . Then for any z , z ∈ R p , d ( z , z ) = d LL T ( z , z ) .These two results are useful for interpretation of the MDS embedding space in the fol-lowing way: In gPCA, the lower-dimensional representation of the samples is obtained byprojecting the samples onto the gPCA axes, with the projection being with respect to the Q in-ner product. If we are performing MDS with a generalized Euclidean distance, we can obtainthe axes onto which the samples are being projected with Theorem . . In the more commonsituation where we are performing MDS with an arbitrary distance, if we see that the localbiplot axes are approximately constant, Theorems . and . suggest that the distance can beapproximated by a generalized Euclidean distance and that the relationship between the dataspace and the embedding space can be approximated as a projection onto the local biplot axes. . . Supplemental point located at centroid of local biplot axes for generalized Euclideandistance

Our next result relates the position of a supplemental point in the MDS embedding spaceto the centroid of the local biplot axes scaled up or down according to the values of the thevariables for the supplemental point. As mentioned in the introduction, our local biplots areinspired by the non-linear biplots in Gower & Harding ( ), and Gower gives a similar inter-polation result for his non-linear biplots in the special case of distances that are decomposableby coordinate. Before proceeding to our result for local biplots and generalized Euclidean dis-tances, we give a restatement of Gower’s result. Our statement of the theorem is in terms ofthe embeddings of supplemental points corresponding to scaled standard basis vectors insteadof non-linear biplot axes, but it amounts to the same thing.

Theorem . . Interpolation. Let X ∈ R n × p , and perform MDS on ( X , d ) with d such that d ( x , y ) = ∑ pj = h ( x j , y j ) . The rows of X are denoted x i , and the ij th elemont of X is x ij .Let z = ∑ pj = α j e j be a supplemental point, let p j be the embedding of α j e j in MDS space: j : = f ( α j e j ) , where f is the map from data space to embedding space as deﬁned in . Let c = p ∑ pj = p j . The coordinates of f ( z ) = f ( ∑ pj = α j e j ) will then be p c − ( p − ) f ( p ) .The points p j are points on Gower’s non-linear biplot axes, and the result shows that theembedding of a supplemental point can be found from the centroid c of the non-linear biplotaxes for each variable represented in the supplemental point.We have a similar interpolation result for local biplot axes. In our case, instead of restrictingthe class of distances to those decomposable by variable, we restrict to generalized Euclideandistances. Theorem . . Interpolation for generalized Euclidean distances. Let Q (cid:31) d : R p × R p → R + be deﬁned as d ( x , y ) = (cid:2) ( x − y ) T Q ( x − y ) (cid:3) . By Theorem . , we know that LB ( z ) = QV for some V ∈ R p × k and any z ∈ R p . Let c = p ∑ pj = LB ( z ) T · , j = p ∑ pj = α j ( QV ) T · , j be the signed and weighted centroid of the local biplot axes, with weights α j . The embeddingof the supplemental point z will be f ( z ) = p c .As before, we expect this result to hold approximately when the local biplot axes areapproximately constant and the distance can be interpreted as approximately a generalizedEuclidean distance. . R esults To illustrate the value of local biplots and to show what sorts of insights they can provide, wepresent local biplots on real and simulated datasets. . Simulated data for phylogenetic distances

To show how local biplots can help us interpret an MDS embedding of a set of samples, wecreate MDS embeddings and the associated local biplots for two different distances on a singledataset. To make the comparison as straightforward as possible, we set up the simulated dataso that the MDS embeddings of the samples with the different distances are approximatelythe same. We will see that despite the similarity of the sample embeddings, the features usedto create the embeddings are quite different for the two distances. he distances we will use for the comparison are the Manhattan distance and weightedUniFrac (Lozupone et al. ). Recall that the Manhattan distance between points x , y ∈ R p is d M ( x , y ) = ∑ pi = | x i − y i | . Because the Manhattan distance is continuous but is notdifferentiable everywhere, we use the positive local biplot axes, as deﬁned in ( ). The resultsare qualitatively similar for the negative local biplot axes. There is a slight difference in thatfor the negative local biplot axes, many were exactly the same, which has to do with the factthat the data are bounded below at and many data points lie exactly on that boundary.Weighted UniFrac is a phylogenetically-informed distance commonly used to quantify thedissimilarity between bacterial communities. If we have a phylogenetic tree T with B branches, p tips labeled 1, . . . , p and vectors x , y ∈ R p , then the weighted UniFrac distance between x and y is d w ( x , y ) = ∑ Bb = l b (cid:12)(cid:12)(cid:12) ∑ i ∈ desc ( b , T ) x i / (cid:107) x (cid:107) − ∑ i ∈ desc ( b , T ) y i / (cid:107) y (cid:107) (cid:12)(cid:12)(cid:12) where desc ( b , T ) = { i : tip i descends from branch b } and l b is the length of branch b .The idea behind the weighted UniFrac distance is that it takes into account similarity be-tween the variables as measured by proximity on T . It does not treat all of the variables asequally dissimilar the way more standard distances do. Weighted UniFrac is also the solutionto an optimal transport problem (Evans & Matsen ).So that we can perform MDS with weighted UniFrac, we create a phylogenetic tree T with p leaves and a matrix X ∈ R n × p . T is a balanced phylogenetic tree with p leaves, labeled1, . . . , p . Let shallow ∈ {

0, 1 } p be such that if i and j represent sister taxa in T , exactly oneof ( shallow ) i , ( shallow ) j is equal to . Call the two children of the root of T l and r . Let deep ∈ {

0, 1 } p be such that ( deep ) i = i descends from l and otherwise. Let a = ( Tn /2 , Tn /2 ) . We then deﬁne A = c ( a Tshallow + ( n − a )( p − shallow ) T ) (cid:12) ( )exp ( c ( a Tdeep + ( n − a )( p − deep ) T )) ( ) x ij ∼ Double Pois ( a ij , s ) ( )where c , c are constants, n indicates the n -vector containing all ’s, shallow and deep are asdeﬁned above, exp applied to a vector indicates the element-wise operation, and (cid:12) indicatesthe element-wise product. he idea behind this setup is that there are two groups of samples, those for which ( a ) i = ( a ) i =

0. There are are two differences between these groups: the ﬁrstis a mass difference from one half of the tree to the other, and the other an exclusion effect, inwhich for each pair of sister taxa, if one is present the other is absent. These two effects can beseen in Figure , which provides a visualization of the tree T and the data matrix X .Our ﬁrst step is to create MDS embeddings of the samples in the rows of X using either theManhattan distance or weighted UniFrac. The results of that embedding are shown in Figure , and as expected, the MDS representation of the samples separates the samples for which ( a ) i = ( a ) i = . The LB axes were computed at the samplepoints. We see that for the Manhattan distance, the variables with positive values for the ﬁrstlocal biplot axes are those for which ( shallow ) i =

1, and the variables with negative valuesfor the ﬁrst local biplot axes are those for which ( shallow ) i =

0. The local biplot axes areapproximately the same everywhere in the space, and so we can interpret the MDS/Manhattandistance embedding along the ﬁrst axis as being approximately a projection onto a vectordescribing the contrast between variables with values of vs. for shallow .We then turn to the local biplot axes for MDS with the weighted UniFrac distance tovisualize the relationship between the variables and the MDS embedding space for weightedUniFrac. The local biplot axes plotted were again computed at the sample points and areshown in Figure . In contrast to the local biplot axes for MDS with the Manhattan distance,the variables with positive values on the ﬁrst local biplot axes are those for which ( deep ) i = ( deep ) i = l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l Variable Type llll ( deep ) i = 0, ( shallow ) i = 0( deep ) i = 0, ( shallow ) i = 1( deep ) i = 1, ( shallow ) i = 0( deep ) i = 1, ( shallow ) i = 1255075 Abundance

Figure : Phylogenetic tree (top) and simulated data (bottom). White boxes indicate abun-dance in the data matrix X , and other colors from blue to red indicate low to high non-zeroabundances. Samples are encoded in rows, and variables in the columns. Each variable corre-sponds to one tip of the tree, and the colored dots at the tips of the tree represent the variables.The variable colors are carried through to Figure . Note that the ﬁrst group of ten samples inthe top rows has a higher abundance of variables on the right-hand side of the tree than theleft-hand side of the tree, while the reverse is true for the second group of the ten samples.In addition, the ﬁrst group of ten samples contains only representatives of the blue and greenvariables, while the second group of ten samples contains only representatives of the purpleand yellow variables. ll ll lll ll −2−1012 −3 0 3 Axis 1 A x i s Sample EmbeddingsMDS with Weighted Unifrac l ll lllll ll −1000100−500 −250 0 250

Axis 1 A x i s Sample EmbeddingsMDS with Manhattan Distance

Sample type l ( a ) i = 0( a ) i = 1 Figure : Classical multi-dimensional scaling representation of the samples in the simulateddataset using the weighted UniFrac distance (left) and Manhattan distance (right). Each pointrepresents a sample, and shape represents which group the sample came from. The represen-tations are not identical, but in each case MDS gives two clusters corresponding to the twogroups we simulated from. ll llll l llllll ll ll lll l ll ll l lllll lll lllll llll llll ll llll llll ll llllll l l ll lll llll ll llllll ll ll ll ll ll llll lll ll ll llllll l lll ll ll lllll ll ll l lllll lll lll lll llll llll lll ll ll lllll llll ll llll ll lllll llllll ll lll lll lll ll ll ll ll ll lll llll llll llll lll llll ll l lllll lllll l llll llll lll lllllll ll lll llll lllll lllll lll ll lll ll ll lll ll llllll lll l llll ll llll llll lll ll l llllll lllll lllllll lll l lll lllll l lllll ll llllll llll lll ll lllll l ll l lll l lllllll l lll llllll ll lllll lll ll ll lll llllll l ll llll ll lllll lll l ll ll ll llllll l ll llll ll llllllllllll lll ll l ll ll ll ll lll ll ll ll l lll ll llll ll lllll ll llll l lllllllllll llll ll llll l l lllll ll lll lll ll ll l ll lll llll ll lll lll lll llllll ll lll llll lll lll ll ll lllllllll l ll lll ll lll l l −202 −4 0 4 Axis 1 A x i s Local biplot axes for weighted UniFrac l l ll ll ll l llll l ll ll ll ll ll l ll l lll l ll ll ll ll l l llll ll ll l lll ll lll l ll l lllll ll ll lll ll ll l ll ll l l lll lll ll ll l lll ll lll llll l ll ll ll ll l lll ll llll ll ll lllll lll llll lll ll lllll lll l ll lll ll ll lll lll ll ll lll l llll lll ll ll ll ll llll ll l ll l ll ll l lll lllll lll l lll ll ll llll llll l ll lll lll llll lll l ll l l lll l lll lllll ll ll lll ll l lll l ll ll llll llll ll lllll l llll l ll ll ll lll l llll ll lllll ll l ll ll ll llllll ll ll ll lll ll ll ll l llll l lll ll ll l lll l lll l l ll llll ll llll lllllll l lll ll l lll lll llllll l l lll ll lll llll l l llll l ll lll lll llll l l ll l lll llll l llllll l l ll l ll ll llllll lll l llll lll lll l l lll ll lll llll ll ll l ll ll lll lll ll l lll llll ll ll lll lll l ll lll ll ll ll l l ll lll ll l ll lll lll ll l lll l ll ll ll l ll ll lll ll lll l lllll l llll l ll lll lll ll l ll ll llll l −300−200−1000100200 −600 −300 0 300 600

Axis 1 A x i s Local biplot axes for Manhattan distance

Figure : Local biplot axes for MDS/weighted UniFrac (top) and MDS/Manhattan distance(bottom). Black circles or triangles represent the sample embeddings. A segment connectedto a sample point represents a local biplot axis for one variable at that sample point. Colorrepresents variable type, and matches the colors of the points on the tips of the trees in Fig-ure . Blue/purple vs. yellow/green represent variables corresponding to ( deep ) j = ( deep ) j =

1, and blue/green vs. purple/yellow represent ( shallow ) j = ( shallow ) j = f the ﬁrst LB axes are consistent, and so we can interpret the ﬁrst axis as being approximatelya projection onto a vector describing the contrast between variables with value of vs. for deep .To see how different the local biplot axes are from other proposals for explaining the MDSembedding space, we create correlation biplots for MDS with weighted UniFrac and MDSwith the Manhattan distance. The correlation biplot axes for these two embeddings are shownin the left and right panels of Figure , respectively. In contrast to the local biplot axes forthese two embeddings, the correlation biplot axes for weighted UniFrac and the Manhattandistance are very similar to each other. In each case, the variables for which ( shallow ) i = ( shallow ) i = p > n , there is more than one way to sepa-rate two groups of samples. The local biplot representation in this simulation shows how twodifferent distances implicitly use different sets of features to separate the two groups: relativeabundance of the two halves of the tree in for weighted UniFrac, and relative abundance ofalternating sets of leaves on the tree for the Manhattan distance. This is information that wemight have expected based on the known properties of the distances (weighted UniFrac usesthe tree and Manhattan distance doesn’t), but that is not directly available otherwise. We seethat the local biplot axes give us much more insight into how the variables are used than thecorrelation biplot axes do.This sort of information about how the variables relate to the MDS embedding space isparticularly important if the distance was designed to incorporate the analyst’s intuition aboutimportant features of the data for the particular problem. In the case of weighted UniFrac, theintuition is that the tree provides important information, and an explanation of the difference Axis 1 A x i s Correlation biplot axes forWeighted Unifrac −0.250.000.250.50 −0.5 0.0 0.5

Axis 1 A x i s Correlation biplot axes forManhattan distance

Figure : Correlation biplots for weighted UniFrac (left) and Manhattan distance (right). Eachsegment is a biplot axis. Color represents variable type, and matches the colors of the pointson the tips of the trees in Figure . Blue/purple vs. yellow/green represent variables cor-responding to ( deep ) j = ( deep ) j =

1, and blue/green vs. purple/yellow represent ( shallow ) j = ( shallow ) j =

1. Note the similarity of the correlation biplot axes for thetwo MDS representations, in contrast to the LB axes in Figure , which are quite different forweighted UniFrac vs. the Manhattan distance. etween the two groups as being over- or under-represented in one half of the tree is moreuseful than an explanation of the difference between the two groups as being the presence orabsence of half of the species. Of course, the opposite could also be true: we could be moreinterested in the exclusion effect and try to design a distance that emphasizes that instead ofthe deep splits in the tree. The local biplot axes would again be more useful than correlationbiplot axes in that situation, because they describe the relationship between the data spaceand the embedding space instead of simply describing marginal relationships between thevariables and the embeddings. . Real data for phylogenetic distances

To illustrate the utility of local biplots on a real dataset, we analyze data from a study ofthe effect of antibiotics on the gut microbiome initially described in Dethlefsen & Relman( ). In this study, three individuals were given two courses of the antibiotic Ciproﬂoxacin,and stool samples were taken in the days and weeks before, during, and after each courseof the antibiotic. The composition of the bacterial communities in each of these sampleswas analyzed using S rRNA sequencing (Davidson & Epperson ), and the resultingdataset describes the abundances of bacterial taxa in each of samples (between and samples were taken per individual). In addition, the taxa were mapped to a referencephylogenetic tree (Quast et al. ), giving us the evolutionary relationships among all thetaxa in the study.In the initial analysis of this dataset, multi-dimensional scaling with the unweighted UniFracdistance (Lozupone & Knight ) was used to visualize the samples. Unweighted UniFracis a phylogenetically-aware distance that only takes into account the presence or absence of abacterial taxon. In our reanalysis, we add weighted UniFrac (a phylogenetically-aware distancethat takes into account abundance instead of presence/absence), two generalized Euclideandistances that use the phylogeny (the ﬁrst based on double principal coordinates analysis (DP-CoA) (Pavoine et al. ) and the second related to adaptive gPCA (Fukuyama a )), andthe standard Euclidean distance.In Figure , we see that each of the ﬁve distances gives a different representation of thesamples. Each panel in the ﬁgure shows the MDS embeddings of the samples with a different llllllllllllllll l ll lll llllllllllllllll llll l lll llll lll ll −0.2−0.10.00.10.2−0.2 −0.1 0.0 0.1 0.2 Axis 1: 20.1% A x i s : . % Euclidean distance llllllll llllllllll ll llll llllll lllllllll lllllllllll ll ll ll −0.10.00.10.2 −0.1 0.0 0.1 0.2

Axis 1: 23.3% A x i s : . % UniFrac llllll lll llllllll l lllll lllll lll llllllllllll l lll llll lll ll −0.2−0.10.00.1 −0.2 −0.1 0.0 0.1

Axis 1: 19.5% A x i s : . % Adaptive gPCA lllll l ll l lll ll l ll llll llllll l l ll ll llllll ll ll l llllllll ll ll −0.10.00.10.20.3 −0.1 0.0 0.1 0.2 0.3

Axis 1: 27.1% A x i s : . % Weighted UniFrac l llllll llllll ll lll lll lll l llll ll ll llllllll lll l lll l lll ll l −0.3−0.2−0.10.00.10.2 −0.3−0.2−0.10.0 0.1 0.2

Axis 1: 47.2% A x i s : . % DPCoA

Subject l DEF

Antibiotic Treatment ll abxno abx Figure : MDS plots of the microbiome data with ﬁve different distances. From top left:PCA or Euclidean distance, unweighted UniFrac, adaptive gPCA (a generalized Euclideandistance), weighted UniFrac, and DPCoA (a generalized Euclidean distance). The differentdistances give different representations of the samples, with different amounts of clusteringby subject and by abx/no abx condition. istance. Each point represents one sample, with shape representing the subject and colorrelated to whether the subject was on the antibiotic or not. “Abx” refers to samples takenwhen the subject was taking the antibiotic and the week immediately after, while “no abx”refers to samples taken before the ﬁrst course of the antibiotic, at least a week after the ﬁrstcourse of the antibiotic but before the second course, and at least a week after the second courseof the antibiotic. With the Euclidean distance, there is a strong clustering of the samples byindividual in the principal plane. The abx/no abx conditions are offset from each other withinthe cluster for each individual, but the direction of the offset is different for the differentsubjects. With unweighted UniFrac, there is still clustering of the samples by subjects, butthe offset within each subject associated with the abx/no abx condition is now in a consistentdirection for each subject, and is associated with the ﬁrst MDS axis. With the distance based onadaptive gPCA, we see a similar pattern as in unweighted UniFrac: clustering by subject, anoffset within each subject associated with the abx/no abx condition in a consistent directionfor each subject. However, in adaptive gPCA, the ﬁrst MDS axis is associated with subject,while the second is associated with the abx/no abx condition. With weighted UniFrac andDPCoA we lose the distinct clusters associated with subject. In each case there is an offsetassociated with the abx/no abx condition, but the magnitude of the offset is perhaps smallerthan that seen with the other distances.The differences between the representations provided by the ﬁve distances are intriguing:it seems that some distances favor an interpretation of the abx effect as being consistent fromsubject to subject while others favor an interpretation of a subject-speciﬁc effect; some dis-tances favor an interpretation of the subjects falling into distinct clusters and others do not.We can guess at the causes of these differences using the limited knowledge we have aboutthe distances: The Euclidean distance does not use the phylogenetic information, while theothers do. It makes sense that the makeup of a subject’s microbiota and the effect of an an-tibiotic would look more distinct when each taxon is viewed as equally distinct (implicit inthe Euclidean distance) than it would when phylogenetic relationships are taken into account.Looking at the local biplot axes for each MDS representation will allow us to sharpen thisintuition.The local biplot axes for MDS with the ﬁve distances, given in Figure , show a suggestive llllllllllll ll l llll lllllll llll ll l llllll ll ll ll ll lllllllllllllll ll lll l l llll llll ll llll ll ll l lll ll l llll llll ll llll lll lll l llllllllllll llllll ll ll lllll ll lllllllll ll ll l llllll lllllll lllll llllllllllllll lllllllllllll llllllllll ll ll lllll llllll lll llll llll llll llllllllllllllllllll llllllllllllll lllll lllll l lllll llllll llllllllll ll lllllllll ll l lll llll lllllllllllllllllllllll lllllllllll llllllll l ll lllllll lllll lll lll l llllllllll llllllll l l llll llllllllll llll l lll ll lll lll llllllll llll llll l llllllllllllllll lllllllllll lllll llllllllll lllllllllll l lll lllll llll lllllllllll llll lll llllll lllll lll llll llll lllll lll lllll lllllllll llllllllllllllllllllllllllllllllllllllllll l lll llllllllllllll llllllll ll ll lllllll lllll lll llllllll llllllll ll lllllllllllllll ll llll l ll llllllll lllll ll ll llll lllll l llllllllll llllll llllll lll lll l llllll llllllll l llll llllllllllll llllll llllllllllllll lllllllll lll ll l lll lll lll llllllllllllllllllll lllllllllllllllll llll llllllll llll ll lllll lll ll ll lllllllllllllllll lllllllllll llllllll lll llllllllllll llllllll l llllllllllll ll lll lll lllllllllllllllllllll llllllll llllll llllllllllllllllllllllllllllllllllllllllllll lllll l lll ll llllll llllll llll ll ll llll ll lllll l l llllll ll ll ll llll llllllll llllllllll llllll lllll l lll lllllllllll ll llllllll llllllllllllllllll l ll lllll llll l llllllll lllll lll lllll lllllll lll lll lll l lllll ll llllllll lllllllllll lllllllllll lllllllllll lll lllllllll llllllllllllllll lll ll lll lllllllllllllllll llllllllll llll lll llllll llllllllll lllllllll llllllllll llllllllllllllll llllllllllllllllll llllllll llllllllllllll llllll llllllll lll llllllllllllllllllllllllllllll llllll ll ll llllllllllllllllllllllllllllllllllllllll llllll ll lllll lll lll llllllllll lll ll llll lllllllllll lll lllllllllllllllllllllllllllllllll l −0.2−0.10.00.10.2 − . − . . . . Axis 1: 20.1% A x i s : . % Euclidean distance lllll l lllll ll llllllllll lll l llllllllllllllllllll ll lllllllll lllll lll lllll lllllllllllllllll lllllll llllllllllllllllllllllll ll lll llllll lll lllllllll l llll llllllllllllllll lllllllllll lllllllllllllllllllllllllll ll lllllll llll lllllllllll llllllllllllllllllllllllllllllllllll llllllllllllll lllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllll lllllllllllllllllllll lllllllllll lllllll llllllllllllllllllllllllll lllllllllllllllll llllllllllllllllllllllll llllllll llllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllll ll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll llllllllllllllll llllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll llllllllllllllllllllllll llllllllllllllllllllllll lllllllllll lllllllllllllllllllll lllllll llllllllllllll llllllllll llll lll ll lllllllllllllllllll lll lllll ll lllll l llll lllllllllllllllllll llllllllllllllllllllllllllllllllll llll ll lllllll llll l llll lllll lllll lllll llllll llll llllllllllllllll llll lllll llllllllllll l lllllllllll lllllllllllll lllll lllllllll llllllll lllll lllll lll l llllllllll llllllllllllll llllllllllll lllllllll ll lllllll l lllllll lllllll ll l llllllllllllllllllll llllllllll llllllllllllllllllllll l llllllllllll llllllllll lllllllllllllll lllllllllll llll ll lll llllllllllllll llllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllll lllllllllllllllll llllll l lllllllllll ll llllll llllllllllllllllllllllllllllllllll lllllllllll llllll lll l lllllllll lllll lllllll l −0.0050.0000.005 − .

005 0 .

000 0 . Axis 1: 23.3% A x i s : . % UniFrac lllll llllllll llll llllllllll lllllllllllllllll ll l l l llllllllllllll ll lllll llll llllllllll llll llllll l llllllllllllllllllllllll lllll llll lllll l lllll llllll llllllllllllllllll lllllllll ll l lllllllllllllll l lllllllllllllllllllllll llllllllllllllllll llll lllllll lllllllllllllllllllllllllllllllllllllllllllll l lllllllllllllllllllll lllllllllll lll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll lll l llllllllllllllllll lllllllllllllllllll ll lllllllllll lllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lllllllllllllllllllllllllllllllllll llllllllllllllll lll llllllllllllllllllllllll llllllllllllllllllll llllll llllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllll lllllll lllllllllllll llllll ll lll llll llllllllllllllll lllllllllllllllllllllllll lllllllllllll llllll lllllllllllllllllllllllllllll ll llllllllllllllllll lllllllllllllll lllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllll llll llllllllllllllll lllllllllllllllllllllll llllllllllllllllllllllllllllllllll l −1e−04−5e−050e+005e−05 − − − −

05 0e +

00 5e − Axis 1: 19.5% A x i s : . % Adaptive gPCA llllllll lllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll lllllll llllll lllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll lllll llllllllllllllllll llllllll lllllll llllllllllllllllllllllll lllllllllllllllllllllllllllllllllll llll −0.0010.0000.0010.002 − .

001 0 .

000 0 .

001 0 . Axis 1: 27.1% A x i s : . % Weighted UniFrac lllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllll lllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll −0.250.000.250.50 − . − .

25 0 .

00 0 . Axis 1: 47.2% A x i s : . % DPCoA

Family lllllllllllllll

UnknownAlcaligenaceae_SutterellaAlistipesButyricimonasClostridiaceaeEubacteriaceaeLachnospiraceaeParabacteroidesPeptococcaceaePeptostreptococcaceaePrevotellaRuminococcaceaeStreptococcaceaeVeillonellaceaeYersinia

Figure : Local biplot axes at one point for MDS of the microbiome data with each of ﬁvedifferent distances. From top left: LB axes for PCA/Euclidean distance (LB axes independentof location), UniFrac (LB axes for one of the data points embedding near the center of theplot), adaptive gPCA (a generalized Euclidean distance, so LB axes independent of location),weighted UniFrac (LB axes for one of the sample points embedding near the center of theplot), and DPCoA (a generalized Euclidean distance, so LB axes independent of location).Each point corresponds to the LB axis for one variable, which in this case are bacterial taxa.Points are colored according to the family the taxon is associated with. Note the difference inthe relationship between taxonomic family and LB axis values: a strong association for DPCoAand weighted UniFrac, none for the Euclidean distance, and an intermediate association forthe agPCA-based distance and unweighted UniFrac. attern. Each panel corresponds to one of the MDS representations, and for each represen-tation one set of local biplot axes is given. Each point represents the local biplot axis forone variable, which in this case corresponds to a bacterial taxon. The colors in the plot cor-respond to the family the bacterial taxon belongs to. This information was not used in theconstruction of either the MDS plot or the local biplot axes, except (for the phylogeneticallyaware distances) to the extent that taxonomic family and phylogeny align. The local biplotaxes for the three generalized Euclidean distances (the ﬁrst, third, and ﬁfth panels) do notdepend on the position in taxon space, and the axes shown are therefore representative ofthe whole space. The local biplot axes for unweighted UniFrac and weighted UniFrac dodepend on the position in taxon space. For reasons of space, only one representative setof axes is shown for each, and in both cases the local biplot axes shown correspond to oneof the samples that has an embedding near the origin in the embedding space. An inter-active shiny (Chang et al. ) app showing axes at arbitrary sample points is available at https://jfukuyama.shinyapps.io/local-biplot-antibiotic-vis/ .The ﬁrst thing we notice about the local biplot axes for the different MDS representationsare differences in the relationship between the local biplot axes and the taxonomy. We seethat for DPCoA and weighted UniFrac, the local biplot axes corresponding to taxa in the samefamily tend to have very similar values. At the other extreme, with the Euclidean distance,there is no relationship between local biplot axis and family. The local biplot axes for un-weighted UniFrac and adaptive gPCA fall somewhere in the middle, with local biplot axescorresponding to taxa in the same family tending to have similar values, but not to the extentseen in weighted UniFrac or DPCoA.Recalling our result in Theorem . , we can use the local biplot axes to interpret the MDSspace for the Euclidean distance, adaptive gPCA, and DPCoA. In each case, the location of asample in the embedding space is a linear combination of the variables, with the weights givenby the values of the local biplot axes. Thus, we see that for the Euclidean distance, the sampleembeddings are given by a linear combination of the variables where the weights are unrelatedto the taxonomy. On the other extreme, for DPCoA, the sample embeddings are given by alinear combination of the variables that can be approximately described as (on the ﬁrst axis) acontrast between Lachnospiraceae and Ruminococcaceae/Unknown and (on the second axis) contrast between Ruminococcaceae and unknown. (Since the interesting offset that we see inthe DPCoA sample embeddings between the abx/no abx conditions is a upper left vs. bottomright offset, perhaps that is actually the axis of interest and not the the ﬁrst or second MDSaxes. Along the y = − x line, the embedding of the samples can be described as a contrastbetween Lachnospiraceae and Ruminococcaceae. These two families were in fact described inthe initial paper as those showing a difference between the two conditions.) Adaptive gPCAis somewhere in between the Euclidean distance and DPCoA: the sample embeddings can bewell approximated as a linear combination that is approximately piecewise constant on smallgroups of closely related taxa.Neither weighted UniFrac nor unweighted UniFrac is a generalized Euclidean distance,and the local biplot axes are not constant for either MDS representation. However, the local bi-plot axes that we obtain suggest certain conclusions. The local biplot axes shown for weightedUniFrac are approximately constant and approximately the same as those for DPCoA reﬂectedover the y axis. This suggests that DPCoA and weighted UniFrac with multi-dimensionalscaling are using the same internal representation of the samples, one related to the relativeabundances of major divisions in the phylogenetic tree. This is in line with the similarity ofthe embeddings given by MDS with weighted UniFrac and DPCoA: in each case, we see littleseparation of the samples but an offset between the abx/no abx conditions along the y = x and y = − x lines for weighted UniFrac and DPCoA, respectively.Similarly, the qualitative results in the sample embeddings for MDS with unweightedUniFrac and adaptive gPCA (clustering of subjects and a consistent direction for the offsetof the abx vs. no abx condition) mirrors the qualitative relationship between the local biplotaxes and the taxonomic family for that pair (some relationship between taxonomic family andthe value of the local biplot axis, but not as much as DPCoA/weighted UniFrac).Overall, we see that the local biplot axes help us interpret the MDS embedding space. Inthis particular instance, they suggest that the phylogenetic distances do some implicit smooth-ing of the data along the phylogenetic tree, and that the amount of smoothing increases aswe go from Euclidean distance to unweighted UniFrac to adaptive gPCA to weighted UniFracto DPCoA. This is in line with previous work (Fukuyama b ) showing empiricially sucha gradient in a larger set of phylogenetically-informed distances. In addition to providing nsight into general properties of the distances, the local biplot axes give us insight into theparticular MDS representation at hand: which sets of bacterial taxa are likely to be over- orunder-represented in particular samples or particular regions of the embedding space. Thisis information that is not traditionally available in MDS but is of great importance to theconsumers of such diagrams. . D iscussion We have introduced local biplot axes as a new tool for unboxing the black box that is multi-dimensional scaling. We showed that these local biplot axes are a natural extension of PCAbiplot axes and that there is a close connection between these axes and generalized princi-pal components. Our extension of the classic result on the equivalence of classical multi-dimensional scaling with the Euclidean distance and principal components gives us greaterinsight into the relationship between data space and embedding space for multi-dimensionalscaling. It particular, it suggests that for an arbitrary distance, if the local biplot axes areapproximately constant, the distance can be well approximated by a generalized Euclideandistance and that the generalized Euclidean distance providing the approximation is a func-tion of the local biplot axes. Because generalized Euclidean distances can be understood asthe standard Euclidean distance on a linear transformation of the input variables, this helpsus interpret what variables or features in the data are important.Even in the case where the local biplot axes are far from being constant on the input variablespace, the local biplot axes allow us to investigate whether there are regions of local linearity(regions in which the local biplot axes are constant or approximately constant), and thereforewhere we might be able to approximate the distance as a generalized Euclidean distance orthe MDS map as a linear map from data space to embedding space. This gives us much moreinsight into the relationship between data and embedding space than is available either withMDS on its own (no information) or other proposals for biplots for MDS, which tend to beabout a global linear approximation (Satten et al. ; Wang et al. ; Greenacre ).Our simulated data example showed that local biplots can uncover substantive differencesin the relationship between data space and embedding space, even in cases where MDS with wo different distances gives the same embedding of the samples. Our real data exampleshowed that local biplots can suggest scientiﬁcally relevant reasons for the differences in therepresentations of the samples provided by MDS with different distances. They can alsosuggest more general properties of the distances used: in this case, we saw that differentphylogenetic distances appear to implicitly smooth the variables along the phylogenetic treeto different degrees, as has been suggested in other work (Fukuyama b ).There are several avenues for future work. Perhaps most importantly, although our re-sults suggest that approximately constant local biplot axes should imply that the distancesare approximately of the form of a generalized Euclidean distance, we do not quantify theapproximation or indeed how to choose the best generalized Euclidean distance for the ap-proximation. Similarly, to identify regions of local linearity, we would need a way to quantifythe similarity of the local biplot axes in different regions of the space, and it is not immediatelyobvious what the best measure would be for this task. Other potential areas for investigationinclude developing the relationship between distances and probabilistic models through therelationship between generalized Euclidean distances and the probabilistic interpretation ofgeneralized PCA, using the LB axes as a starting point for conﬁdence ellipses for MDS, andextending the ideas here to other types of low-dimensional embeddings. A ppendix : P roofs Proof of Theorem . . This follows from Theorem . , since the standard Euclidean distance d ( x , y ) is the same as the generalized Euclidean distance d I ( x , y ) , and the singular value de-composition of X is the same as the generalized SVD of ( X , I , I ) . Proof of Theorem . . This is a straightforward application of the spectral decomposition theo-rem. Let Q be a symmetric square root of Q , and let Q − be its inverse. The spectraldecomposition theorem tells us that we can ﬁnd ˜V and ˜ Λ such that Q X T DXQ = ˜V ˜ Λ ˜V T ( )with ˜V T ˜V = I p , λ ≥ λ ≥ · · · ≥ λ pp ≥ hen deﬁne V : = Q − ˜V and Λ : = ˜ Λ . We see that X T DXQV = X T DXQ ˜V ( ) = Q − ˜V ˜ Λ ˜V T ˜V ( ) = V Λ ( )and so V satisﬁes X T DXQV = V Λ .Substitution also tells us that V T QV = ˜V T Q − QQ − ˜V = I p ( )and so V and Λ have the required properties. Lemma . . Let X have centered columns, and let d Q ( x , y ) = (cid:112) ( x − y ) T Q ( x − y ) . Let ∆ , C n beas deﬁned in ( ). Then we have − C n ∆ C Tn = XQX T . Proof of Lemma . . Let x i = X Ti · be the column vector containing the i th row of X . Let δ ij denote d Q ( x i , x j ) , so that δ ij = ( x i − x j ) T Q ( x i − x j ) = x Ti Qx i + x Tj Qx j − x Ti Qx j .First note that n ∑ i = δ ij = n ∑ i = (cid:16) x Ti Qx i + x Tj Qx j − x Ti Qx j (cid:17) ( ) = n x Tj Qx j + n ∑ i = ( x Ti Qx i ) − x Tj Q n ∑ i = x i ( ) = n x Tj Qx j + n ∑ i = ( x Ti Qx i ) ( )where the last line follows because we have taken X to have centered columns.We also have n ∑ j = n ∑ i = δ ij = n ∑ j = ( n x Tj Qx j + n ∑ i = x Ti Qx i ) ( ) = n n ∑ j = x Tj Qx j + n n ∑ i = x Ti Qx i ( ) hen we have ( − C n ∆ C Tn ) ij = − ( δ ij − n n ∑ i = δ ij − n n ∑ j = δ ij + n ∑ i , j δ ij ) ( ) = − ( − x Ti Qx j + x Ti Qx i + x Tj Qx j − x Tj Qx j − n n ∑ i = ( x Ti Qx i ) − x Ti Qx i − n n ∑ j = ( x Tj Qx j )+ ( )1 n ( n n ∑ j = x Tj Qx j + n n ∑ j = x Ti Qx i )) ( ) = − ( − x Ti Qx j ) ( ) = x Ti Qx j ( )and so − C n ∆ C Tn = XQX T . Proof of Theorem . . By the deﬁnition, LB ( z ) =  ∂∂ z d x ( z ) · · · ∂∂ z d x n ( z ) ... ... ∂∂ z p d x ( z ) · · · ∂∂ z p d x n ( z )  M Λ − ( ) = (cid:16) ∇ d x ( z ) · · · ∇ d x n ( z ) (cid:17) M Λ − ( )For generalized Euclidean distances, d x ( z ) = d Q ( x , z ) = ( x − z ) T Q ( x − z ) . and so ∇ d x ( z ) = Q ( x − z ) ( )This gives us LB ( z ) = (cid:16) Q ( x − z ) · · · Q ( x n − z ) (cid:17) M Λ − ( ) = Q ( X T − x1 Tn ) M Λ − ( ) = QX T M Λ − ( ) = QX T B Λ − ( ) here the second-to-last line follows because the matrix M has centered columns and the lastline is a result of M = B Λ .At this point, we see that the biplot axes have no dependence on the point x . All thatremains is to show the relationship to the gPCA axes.Let V = X T B Λ − We have X T XQV = X T XQX T B Λ − ( ) = X T ( B Λ B T ) B Λ − ( ) = X T B Λ ( ) = V Λ ( )and V T QV = ( X T B Λ − ) T Q ( X T B Λ − ) ( ) = Λ − B T XQX T B Λ − ( ) = Λ − B T B ΛΛ − ( ) = I ( )Therefore, V = X T B Λ − satisﬁes the conditions to be the normalized principal axes in gPCAof the triple ( X , Q , I ) . If we substitute this equivalence in above, we see that LB ( z ) = QV , ( )as desired Proof of Theorem . . Let f : R p → R p be the map from data space to the full-dimensionalembedding space. By our ﬁrst assumption (that the MDS embedding space is of dimension p ), d I p ( f ( x i ) , f ( x j )) = d ( x i , x j ) . By the second assumption (no additional dimensions requiredto embed z ∈ R p ), we have d I p ( f ( x i ) , f ( z )) = d ( x i , z ) .By deﬁnition of local biplot axes, the Jacobian of f is J f ( z ) = L T , and if f is the map fromdata space to embedding space deﬁned in ( ), we must have f ( z ) = L T z + k for some k ∈ R p . hen for any x i , i =

1, . . . , n , z ∈ R p , we have d ( x i , z ) = d I p ( f ( x i ) , f ( z )) ( ) = d I p ( L T x i + k , L T z + k ) ( ) = (cid:113) ( x i − z ) T LL T ( x i − z ) ( ) = d LL T ( x i , z ) , ( )as desired. Proof of Theorem . . Proof is by induction. We will show that for any k =

1, . . . , n , if d ( z , z ) = d LL T ( z , z ) if z , z ∈ span ( x , . . . , x k ) ( )Base case: We show ( ) for k =

1. If z i ∈ span ( x ) , then z i = α i x , α i ∈ R . d ( z , z ) = d ( α x , α x ) ( ) = ( α − α ) d ( x , p ) ( ) = ( α − α ) d LL T ( x , p ) ( ) = d LL T (( α − α ) x , p ) ( ) = d LL T ( z , z ) , ( )where the second line follows by translation invariance and homogeneity of d , the third linefollows from Theorem . , and the remainder is translation invariance, homogeneity, and thedeﬁnition of z i .Induction step: Suppose that ( ) holds for some k ∈ {

1, . . . , n − } , and let z i , z j ∈ span ( x , . . . , x k + ) . We can write z i = z ki + α i x k + for α i ∈ R , z ki ∈ span ( x , . . . , x k ) , i =

1, 2.If α = α =

0, then z i , z j ∈ span ( x , . . . , x k ) , and the result holds by the assumption ( ). If ither α or α (cid:54) =

0, then d ( z , z ) = d ( z k + α x k + , z k + α x k + ) ( ) =  ( α − α ) d (( α − α ) − ( z k − z k ) , x k + ) α − α (cid:54) = d ( z k , z k ) α = α ( )In the case α = α , we have d ( z k , z k ) = d LL T ( z k , z k ) ( ) = d LL T ( z k + α x k + , z k + α x k + ) ( ) = d LL T ( z , z ) , ( )where the ﬁrst line follows by ( ), the second by translation invariance, and the third by thedeﬁnition of z i .Otherwise, we have ( α − α ) d (( α − α ) − ( z k − z k ) , x k + ) = ( α − α ) d LL T (( α − α ) − ( z k − z k ) , x k + ) ( ) = d LL T ( z , z ) ( )where the ﬁrst line follows from Theorem . and the second line is algebra. Thus ( ) holdsfor k , ( ) holds for k + ( x , . . . , x n ) = R p by assumption, the case k = n implies that for any z , z ∈ R p , d ( z , z ) = d LL T ( z , z ) as desired. Proof of Theorem . . Let z = ∑ pj = α j e j be the supplemental point. We can rewrite the distances etween x i and α j e j in terms of their components: d ( x i , α j e j ) = h ( x ij , α j ) − h ( x ij , 0 ) + p ∑ k = h ( x ik , 0 ) ( ) p ∑ j = d ( x i , α j e j ) = p ∑ j = h ( x ij , α j ) − p ∑ j = h ( x ij , 0 ) + p p ∑ k = h ( x ik , 0 ) ( ) = p ∑ j = h ( x ij , α j ) + ( p − ) p ∑ k = h ( x ik , 0 ) ( )This allows us to rewrite the distance between the samples and the supplemental points interms of the distances between x i and α j e j . d ( x i , z ) = p ∑ j = h ( x ij , α j ) ( ) = p ∑ j = d ( x i , α j e j ) − ( p − ) p ∑ k = h ( x ik , 0 ) ( ) = p ∑ j = d ( x i , α j e j ) − ( p − ) d ( x i , p ) ( )Finally, let a ∈ R n be deﬁned as in ( ): a i = ( − C n ∆ C Tn ) ii − d ( x i , z ) . Then we can rewrite a in terms of distances to α j e j and distances to p : a i = ( − C n ∆ C Tn ) ii − d ( x i , z ) ( ) = ( − C n ∆ C Tn ) ii − p ∑ j = d ( x i , α j e j ) − ( p − ) d ( x i , p ) ( ) = (cid:34) p ∑ j = (( − C n ∆ C Tn ) ii − d ( x i , α j e j ) ) (cid:35) − ( p − ) (cid:20) ( − C n ∆ C Tn ) ii − d ( x i , p ) (cid:21) ( )Which ﬁnally allows us to write the map of a supplemental point to the embedding space in erms of the embeddings of α j e j and the embedding p : f ( z ) = Λ − k ,1: k M T a ( ) = Λ − k ,1: k M T (cid:34) p ∑ j = (( − C n ∆ C Tn ) ii − d ( x i , α j e j ) ) (cid:35) − ( ) p − Λ − k ,1: k M T (cid:20) ( − C n ∆ C Tn ) ii − d ( x i , p ) (cid:21) ( ) = p ∑ j = Λ − k ,1: k M T (cid:20) ( − C n ∆ C Tn ) ii − d ( x i , α j e j ) ) (cid:21) − ( p − ) f ( p ) ( ) = p ∑ j = f ( α j e j ) − ( p − ) f ( p ) ( )Since c = p ∑ pj = f ( α j e j ) , this implies that f ( z ) = p c − ( p − ) f ( p ) , as desired. Proof of Theorem . . The result follows if we can show that f ( z ) = V T Qz , because in that casewe will have f ( z ) = V T Qz = V T Q p ∑ j = α j e j = p ∑ j = α j ( QV ) T · , j ( )Since J f = QV (by Theorem . and the deﬁnition of local biplot axes), we know that f ( z ) = V T Qz + k , and we just need to show that k = k .Note that n ∑ i = f ( x i ) = n ∑ i = V T Qx i + n k = V T QX T n + n k = n k ( )because X has centered columns. We also have n ∑ i = f ( x i ) = (cid:16) f ( x ) · · · f ( x n ) (cid:17) n ( ) = Λ k ,1: k ( B · ,1: k ) T n ( )because the embeddings f ( x ) , . . . , f ( x n ) in the ﬁrst k dimensions are given in the rows of k Λ k ,1: k . Then since (cid:107) Λ B T n (cid:107) = T B Λ B T n = T ( − C n ∆ C Tn ) = andthe deﬁnition of C n ), Λ B T n = , and so ∑ ni = f ( x i ) = k . Combined with ( ), we have k = k ,as desired. R eferences Allen, G. I., Grosenick, L. & Taylor, J. ( ), ‘A generalized least-square matrix decomposi-tion’,

Journal of the American Statistical Association ( ), – .Chang, W., Cheng, J., Allaire, J., Xie, Y. & McPherson, J. ( ), shiny: Web Application Frame-work for R . R package version . . . URL: https://CRAN.R-project.org/package=shiny

Daunis-i Estadella, J., Thió-Henestrosa, S. & Mateu-Figueras, G. ( ), ‘Including supplemen-tary elements in a compositional biplot’,

Computers & Geosciences ( ), – .Davidson, R. M. & Epperson, L. E. ( ), Microbiome sequencing methods for studying hu-man diseases, in ‘Disease Gene Identiﬁcation’, Springer, pp. – .Dethlefsen, L. & Relman, D. A. ( ), ‘Incomplete recovery and individualized responsesof the human distal gut microbiota to repeated antibiotic perturbation’, Proceedings of theNational Academy of Sciences (Supplement ), – .Evans, S. N. & Matsen, F. A. ( ), ‘The phylogenetic Kantorovich–Rubinstein metric forenvironmental sequence samples’, Journal of the Royal Statistical Society: Series B (StatisticalMethodology) ( ), – .Fukuyama, J. ( a ), ‘Adaptive gPCA: A method for structured dimensionality reductionwith applications to microbiome data’, Annals of Applied Statistics ( ), – .Fukuyama, J. ( b ), ‘Emphasis on the deep or shallow parts of the tree provides a newcharacterization of phylogenetic distances’, Genome Biology ( ), .Gabriel, K. R. ( ), ‘The biplot graphic display of matrices with application to principalcomponent analysis’, Biometrika ( ), – . ower, J. C. ( ), ‘Some distance properties of latent root and vector methods used in mul-tivariate analysis’, Biometrika ( - ), – .Gower, J. C. ( ), ‘Adding a point to vector diagrams in multivariate analysis’, Biometrika ( ), – .Gower, J. C. & Harding, S. A. ( ), ‘Nonlinear biplots’, Biometrika ( ), – .Greenacre, M. ( ), ‘Ordination with any dissimilarity measure: A weighted Euclidean so-lution’, Ecology ( ), – .Holmes, S. ( ), Multivariate data analysis: The French way, in ‘Probability and Statistics:Essays in Honor of David A. Freedman’, Institute of Mathematical Statistics, pp. – .Lozupone, C. A., Hamady, M., Kelley, S. T. & Knight, R. ( ), ‘Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial commu-nities’, Applied and Environmental Microbiology ( ), – .Lozupone, C. & Knight, R. ( ), ‘UniFrac: A new phylogenetic method for comparing mi-crobial communities’, Applied and Environmental Microbiology ( ), – .Mahalanobis, P. C. ( ), ‘On the generalized distance in statistics’, Proceedings of the NationalInstitute of Sciences of India ( ), â ˘A¸S– .McCune, B., Grace, J. B. & Urban, D. L. ( ), Analysis of Ecological Communities , Vol. , MjMSoftware Design, Gleneden Beach, OR.Pavoine, S., Dufour, A.-B. & Chessel, D. ( ), ‘From dissimilarities among species to dissim-ilarities among communities: A double principal coordinate analysis’, Journal of TheoreticalBiology ( ), – .Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. & Glöckner, F. O.( ), ‘The SILVA ribosomal RNA gene database project: Improved data processing andweb-based tools’, Nucleic Acids Research (D ), D –D . atten, G. A., Tyx, R. E., Rivera, A. J. & Stanﬁll, S. ( ), ‘Restoring the duality between princi-pal components of a distance matrix and linear combinations of predictors, with applicationto studies of the microbiome’, PLoS One ( ), e .Torgerson, W. S. ( ), Theory and Methods of Scaling , Vol. , Wiley, Oxford, England.Wang, Y., Randolph, T. W., Shojaie, A. & Ma, J. ( ), ‘The generalized matrix decompositionbiplot and its application to microbiome data’, mSystems ( ). URL: https://msystems.asm.org/content/ / /e -19