MMultidimensional Scaling for Big Data
Pedro Delicado and Cristian Pach´on-Garc´ıa
Departament d’Estad´ıstica i Investigaci´on OperativaUniversitat Polit`ecnica de Catalunya
July 23, 2020
Abstract
We present a set of algorithms for
Multidimensional Scaling (MDS) to be usedwith large datasets. MDS is a statistic tool for reduction of dimensionality, using asinput a distance matrix of dimensions n × n . When n is large, classical algorithmssuffer from computational problems and MDS configuration can not be obtained.In this paper we address these problems by means of three algorithms: Divide andConquer MDS, Fast MDS and
MDS based on Gower interpolation (the first andthe last being original proposals). The main ideas of these methods are based onpartitioning the dataset into small pieces, where classical MDS methods can work.In order to check the performance of the algorithms as well as to compare them, wedo a simulation study. This study points out that
Fast MDS and
MDS based onGower interpolation are appropriated to use when n is large. Although Divide andConquer MDS is not as fast as the other two algorithms, it is the best method thatcaptures the variance of the original data.
Keywords:
Computational efficiency; Divide and conquer; Gower interpolationformula; Procrustes transformation.
Multidimensional Scaling (MDS) is a family of methods that represents measure-ments of dissimilarity (or similarity) among pairs of objects as Euclidean distances be-tween points of a low-dimensional space. The data, for example, may be correlationsamong intelligence tests and the MDS representation is a plane that shows the testsas points. The graphical display of the correlations provided by MDS enables the dataanalyst to literally “look” at the data and to explore their structure visually. This oftenshows regularities that remain hidden when studying arrays of numbers.Given a square distance matrix D n × n , the goal of MDS is to obtain a configurationmatrix X n × p with orthogonal columns that can be interpreted as the matrix of p variables for the n observations, where the Euclidean distance between the rows of X isapproximately equal to D . The columns of X are called principal coordinates .1 a r X i v : . [ s t a t . C O ] J u l wo questions arise from this approach: is it (always) possible to find this low-dimensional configuration X ? How is it obtained? In general, it is not possible to find aset of p variables that reproduces exactly the initial distances. However, it is possible tofind a set of variables which distance is approximately the initial distance matrix D . Given a matrix X n × p , the matrix of n individuals over p variables, it is possibleto obtain a new one with mean equal to 0 by column from the previous one: (cid:101) X = (cid:16) I − n (cid:48) (cid:17) X = PX , where P = (cid:16) I − n (cid:48) (cid:17) . This new matrix, (cid:101) X , has the same dimensions as the original one but its columnsmean is . From this matrix, it is possible to build two square semi-positive definitematrices: the covariance matrix S , defined as (cid:101) X (cid:48) (cid:101) X /n and the cross-products matrix Q = (cid:101) X (cid:101) X (cid:48) . The last matrix can be interpreted as a similarity matrix between the n elements. The term ij is obtained as follows: q ij = p (cid:88) s =1 x is x js = x (cid:48) i x j (1)where x i (cid:48) is the i − th row from (cid:101) X .Given the scalar product formula, x (cid:48) i x j = | x i || x i | cos θ ij , if the elements i and j have similar coordinates, then cos θ ij (cid:39) q ij will be large. On the contrary, ifthe elements are very different, then cos θ ij (cid:39) q ij will be small. So, (cid:101) X (cid:101) X (cid:48) can beinterpreted as the similarity matrix between the elements.The distances between elements can be deduced from the similarity matrix. TheEuclidean distance between two elements is calculated in the following way: d ij = p (cid:88) s =1 ( x is − x js ) = p (cid:88) s =1 x is + p (cid:88) s =1 x js − p (cid:88) s =1 x is x js . (2)This expression can be obtained directly from the matrix Q : d ij = q ii + q jj − q ij . (3)We have just seen that, given the matrix (cid:101) X , it is possible to get the similarity matrix Q = (cid:101) X (cid:101) X (cid:48) and from it, to get the distance matrix D . Let diag( Q ) be the vector thatcontains the diagonal terms of Q and be the vector of ones, the matrix D is given by D = diag( Q ) (cid:48) + diag( Q ) (cid:48) − Q . The problem we are dealing with goes in the opposite direction. We want to rebuild (cid:101) X from a square distance matrix D , with elements d ij . The first step is to obtain Q andafterwards, to get (cid:101) X . The theory needed to get the solution can be found, for instance,in Pe˜na (2002). Here, we summarise it. 2he first step is to find out a way to obtain the matrix Q given D . We can assumewithout loss of generality that the mean of the variables is equal to 0. This is a conse-quence of the fact that the distance between two points remains the same if the variablesare expressed in terms of the mean, since d ij = p (cid:88) s =1 ( x is − x js ) = p (cid:88) s =1 [( x is − x s ) − ( x js − x s )] . (4)The previous condition means that we are looking for a matrix (cid:101) X such that (cid:101) X (cid:48) = 0.It also means that Q1 = 0, i.e, the sum of all the elements of a column of Q is 0. Sincethe matrix is symmetric, the previous condition should state for the rows as well.To establish these constrains, we sum up (3) at row level: n (cid:88) i =1 d ij = n (cid:88) i =1 q ii + nq jj = t + nq jj (5)where t = (cid:80) ni =1 q ii = Trace( Q ), and we have used that the condition Q1 = 0 implies (cid:80) ni =1 q ij = 0. Summing up (3) at column level n (cid:88) j =1 d ij = t + nq ii . (6)Summing up (5) we obtain n (cid:88) i =1 n (cid:88) j =1 d ij = 2 nt. (7)Replacing in (3) q jj obtained in (5) and q ii obtained in (6), we have the followingexpression: d ij = 1 n n (cid:88) i =1 d ij − tn + 1 n n (cid:88) j =1 d ij − tn − q ij . (8)Let d i. = n (cid:80) nj =1 d ij and d .j = n (cid:80) ni =1 d ij be the row-mean and column-mean of theelements of D . Using (7), we have that d ij = d i. + d .j − d .. − q ij , (9)where d .. is the mean of all the elements of D , given by d .. = 1 n (cid:88) (cid:88) d ij . Finally, from (9) we get the expression: q ij = −
12 ( d ij − d i. − d .j + d .. ) . (10)The previous expression shows how to build the matrix of similarities Q from thedistance matrix D .The next step is to obtain the matrix X given the matrix Q . Let’s assume that thesimilarity matrix is positive definite of range p . Therefore, it can be represented as3 = VΛV (cid:48) , where V is a n × p matrix that contains the eigenvectors with not nulls eigenvalues of Q . Λ is a diagonal matrix p × p that contains the eigenvalues.Re-writing the previous expression, we obtain Q = ( VΛ / )( Λ / V (cid:48) ) . (11)Getting Y = VΛ / . We have obtained a matrix with dimensions n × p with p uncorrelated variables thatreproduce the initial metric. It is important to notice that if one starts from X (i.e X is known) and calculates from these variables the distance matrix in (2) and after thatit is applied the method explained, the matrix obtained is not the same as X , but itsprincipal components. This happens since the distance between the rows in a matrixdoes not change if: • The row-mean values are modified by adding the same row vector to all the rowsin X . • Rows are rotated, i.e, X is postmultiplied by an orthogonal matrix.By (3), the distance is a function of the terms of the similarity matrix Q and thismatrix is invariant given any rotation, reflection or translation of the variables Q = (cid:101) X (cid:102) X (cid:48) = (cid:101) XAA (cid:48) (cid:102) X (cid:48) for any orthogonal A matrix. The matrix Q only contains information about the spacegenerated by the variables X . Any rotation, reflection or translation preserves the dis-tance. Therefore, the low-dimensional configuration is not unique. The following algorithm usually is referred to as classical MDS . Let D be a squaredistance matrix. The process to obtain the principal coordinates is as follows:1. Build the matrix Q = − PDP of cross-products.2. Obtain the eigenvalues of Q . Take the r greatest eigenvalues. Since P1 = 0, where is a vector of ones, range( Q ) = n −
1, being the vector an eigenvector witheigenvalue 0.3. Obtain the coordinates of the rows in the variables v i √ λ i , where λ i is an eigen-value of Q and v i is the associated unitary eigenvector. This implies that Q isapproximated by Q ≈ ( V r Λ / )( Λ r / V (cid:48) r ) .
4. Take as coordinates of the points the following variables: Y r = V r Λ r / . similarity function is characterized by the following properties( s ij denotes the similarity between element i and j ): • s ii = 1. • ≤ s ij ≤ • s ij = s ji .If the initial information is Q , a similarity matrix, then q ii = 1, q ij = q ji and0 ≤ q ij ≤
1. The associated distance matrix is d ij = q ii + q jj − q qij = 2(1 − q ij ) , and it is easy to see that (cid:112) − q ij ) is a distance and it verifies the triangle inequality. As we have mentioned before, the MDS solution is not unique. Since rotations,translations and reflections are distance-preserving functions, one can find two differentMDS configurations for the same set of data. How is it possible to align both solutions?
Align both solutions (or multiple ones) means to find a common coordinate system forall the solutions, i.e, let
MDS and MDS be two MDS solutions of dimensions n × r .We say they are aligned if the coordinates of row i are the same in both solutions: mds i = mds i , . . . , mds ir = mds ir where mds kij is the coordinates j for the individual i given the solution k , j ∈ { , . . . r } , i ∈ { , . . . , n } and k ∈ { , } .This problem is solved by means of Procrustes transformations . The Procrustesproblem is concerned with fitting a configuration (testee) to another (target) as closelyas possible. In the simple case, both configurations have the same dimensionality and thesame number of points, which can be brought into 1-1 correspondence. Under orthogonaltransformations, the testee can be fitted to the target. In addition to such rigid motions,one may also allow for dilations and for shifts.All the details are developed in Borg and Groenen (2005). This is out of the scopeof this paper. However, since it has been a repeatedly used tool, we briefly summariseit. Let A and B be two different MDS configurations of dimensions n × t for the sameset of data. Without loss of generality, let’s assume that the target is A and the testeeis B . One wants to obtain s ∈ IR, T ∈ M r × r (IR) and t ∈ IR r such that A = s BT + (cid:48) where T is an orthogonal matrix. As mentioned before, in Borg and Groenen (2005) areall the details needed to estimate these parameters.5 .5 Multidimensional Scaling with R All the algorithms used in this paper are coded in R . We use two packages for devel-oping our MDS approaches: • Package: stats . From this one we use the function cmdscale to do the classicalMDS. The output of this function is a list of two elements: – The first r principal coordinates for the individuals, i.e, the low-dimensionalconfiguration for the data. – All the eigenvalues found. If the dimensions of the initial dataset are n × k ,then there are n eigenvalues. • Package:
MCMCpack . From this one we use the function procrustes to do Procrustestransformation. The output of this function is a list of three elements: – The dilation coefficient s . – The orthogonal matrix T . – The translation vector t . In this section we present the algorithms developed so that MDS can be appliedwhen we are dealing with large datasets. The first question one might ask is why weneed them if there are already some implementations to obtain a MDS configuration.To answer this question, let’s take a look at Figure 1 and Figure 2.Figure 1: Elapsed time to compute MDS.Figure 1 shows the time needed to compute MDS as a function of the sample size.As we can see, the time grows considerably as the sample size increases when using cmdscale function. Apart of the time issue, there is another one related to the memoryneeded to compute the distance matrix. Figure 2 points out that it is required at least400MB of RAM memory to obtain the distance matrix when the dataset is close to 10 observations. 6igure 2: Memory consumed to compute the distance matrix.In order to solve these problems, we have considered to work on three algorithms(the first and the last being origianal proposals): • Divide and Conquer MDS:
We base this algorithm on the idea of dividing andconquering. Given a large dataset, it is divided into p partitions. After that, MDSis performed over all the partitions and, finally, the p solutions are stitched so thatall the points lie on the same coordinate system. • Fast MDS:
Tynia, Jinze, Leonard, and Wei (2006) give a proposal for solving theproblem of scalability. The authors use a divide and conquer idea together withrecursive programming. • MDS based on Gower interpolation:
In this algorithm we propose to use Gowerinterpolation formula (see, for instance, the Appendix of Gower and Hand 1995),which allows to add a new set of points to an existing MDS configuration.In the next sections we provide a description of the algorithms. If further detailsabout the implementation are needed, the code is provided in Pach´on-Garc´ıa (2019). • The first step is to divide the original dataset into p partitions: X , . . . , X p . Thenumber of partitions, p , is also the number of steps needed to compute the algo-rithm. Each partition has the same number of rows. • Calculate the MDS for the first partition:
MDS(1) . This solution will be usedas a guide to align the MDS configuration for the remaining partitions. We use anew variable, cum-mds , that will be growing as long as new partitions are used.Before adding a new MDS configuration, it is aligned and, after that, added. • Define cum-mds equals to
MDS(1) and start iterating until the last partition isreached. • Given a step k , 1 < k ≤ p , partitions k and k-1 are joint, i.e, X k ∪ X k − . MDSis calculated on this union, obtaining MDS k , k − . In order to add the rows of the k-th partition to cum-mds , the following steps are performed:7 Take the rows of the partition k-1 from
MDS k , k − : MDS k , k − (cid:12)(cid:12)(cid:12) k − . – Take the rows of the partition k-1 from cum-mds : cum-mds (cid:12)(cid:12)(cid:12) k − . – Apply Procrustes to align both solutions. It means that a scalar number s , avector t and an orthogonal matrix T are obtained so that cum-mds (cid:12)(cid:12)(cid:12) k − ≈ s MDS k , k − (cid:12)(cid:12)(cid:12) k − T + (cid:48) . – Take the rows of the partition k from MDS k , k − : MDS k , k − (cid:12)(cid:12)(cid:12) k . – Use the previous Procrustes parameters to add the rows of
MDS k , k − (cid:12)(cid:12)(cid:12) k to cum-mds : cum-mds k := s MDS k , k − (cid:12)(cid:12)(cid:12) k T + (cid:48) . – Add the previous dataset to cum-mds , i.e: cum-mds = cum-mds ∪ cum-mds k As we have seen, the algorithm depends on p , the number of partitions. How many ofthem are needed? To answer this question, let l × l be the size of the largest matrix thatallows to run MDS efficiently, i.e, in a reasonable amount of time. If n is the number ofrows of X , then p is nl .Note that if the number of rows of the original dataset, X , is such that allows to runclassical MDS over X without partitioning it, then p = 1 and Divide and Conquer MDS is just classical MDS.
The aim of this section is to show some indicators about the performance of thealgorithm. A deeper analysis is done in Section 3, where more details are provided.We generate a matrix X with 3 independent Normal distributions ( µ = 0 and σ = 1)and 10 rowsAfterwards, we run the algorithm setting l equals to 500. We require the algorithmto return 3 columns. So, a new matrix with 3 columns and 10 rows ( MDS
Div ) hasbeen obtained. Both matrices should be “equal” with an exception of either a rotation,translation or reflection, but not a dilation. We do not allow dilations to see that thedistance is preserved.To align the matrices we perform a Procrustes transformation, but fixing the dilationparameter ( s ) equals to 1. After that, we compare the three columns (we refer to thecolumns as dimensions ). Figure 3 shows the dimension i of X against the dimension i of MDS
Div , i ∈ { , , } .As we can see, the algorithm is able to capture the dimensions of the original matrix.We do not show cross-dimensions (i.e dimension i of X against dimension j of MDS
Div i (cid:54) = j ), but Table 1 contains the cross-correlation matrix. The results show that dimen-sion i of MDS
Div captures dimension i of X and just dimension i . So, it seems thatthe algorithm, for this particular case, has a good performance in terms of quality.8 l l llll ll ll lll ll lll ll llll ll ll ll ll ll l l ll ll llll ll lllll lll llll l ll ll l llll ll lll ll llll ll l lll ll ll ll l ll l ll l lll ll l lll ll ll l l ll ll ll ll llll ll llll lll l ll llll lll ll lll lll llll ll l ll lll l ll l l lll l llll ll lll lll ll l ll ll ll l llll ll l lll llll ll l lll ll ll ll ll ll llll ll ll lllll lll lll lllll l lll lll llll lll ll lll l llll lll ll ll ll l l ll l ll ll l ll l l l ll ll ll l lll l lllll lll ll lllll ll llll lll ll l l lllll l ll lll l llll l ll l ll ll llll llll l l llll lll lll l l ll l l lll l ll lllll l lllll lll ll l l llll ll lll ll ll llll l ll l ll ll l l lll ll ll lllll ll lll l lll ll ll lll ll lll ll ll ll l lll ll l ll l ll lllll ll ll llll lll l lll ll ll llll llll l lll lll ll ll l lll l lll llll lll l llll l ll l ll lll lll llllll ll lll ll ll ll llll ll lll ll l ll l lll llll ll l l ll ll l llllll llll l ll lll ll lllll lll l lll ll ll ll ll ll llll ll l lll lll lll l llll l l ll l ll ll ll ll lll l ll lll llll ll ll ll ll l llll l lll l lll ll lll l lll l lll llll l llll lll l l ll ll ll ll lll l l lllll l l ll ll l ll llll l lll ll llll l llll l lllll llll ll lll lll ll ll lll l llll ll lllll ll lllll ll l llll ll ll lll ll l ll ll ll ll ll ll l llll l lll l ll ll ll ll ll ll lll lllll ll lll lll ll llll lll lllll l lll ll lllll l ll lllll llll l lll l ll ll l l llll l llll ll llll l l lll ll llll ll l ll ll lll l llll l l ll lll −4−2024 −4 −2 0 2 4 First dim. of X F i r s t d i m . o f D i v i de and C onque r M D S l l ll ll ll lll ll ll lll ll ll ll llll l ll lll ll l lll ll ll l ll lll ll ll lll l lll l llll l llll ll l llll lll lll llll l l llllll lll lll ll lllll ll l l llll lll l llll ll lll lll lll llll l l ll lll lll l ll ll ll ll ll ll ll lllll ll ll l ll l l lll l llllll ll lll ll lllll llll l ll ll lll l llll l lll l l ll ll lll ll ll lll lll lll lll ll ll lll l ll ll ll ll lll llll ll l lll lll lll l llll l l ll ll ll ll ll l lll l ll lll llll ll l ll ll l l lll llll ll llll l ll ll ll l ll lll llll ll llll l l ll llll ll ll llll lll l ll ll l llll lll l llll llll ll lll l l ll llll l lll lllll lll llll ll lll l lll lll l lll ll ll ll ll l lll lll lll llll l ll l ll ll llllll lll llll l ll lll ll llll ll ll l lll l lll ll ll l ll ll llll lll lll l l l lll lll lll lll l ll l llll ll ll ll lllll llll lll l ll l lllllll ll lllll ll ll llll llll ll ll lll lll l lll l lll lll l l lll l lll ll llll lll l ll l ll llll ll lllll ll lll lllll ll ll l lllll ll ll ll llll l l llll ll l ll ll ll ll l lll ll ll ll l llll ll ll l ll ll ll ll l l lll llll llll l llllll ll lll ll l ll ll llll ll l l ll ll l lll lllll l l lll l ll ll l ll ll ll llll l lll ll ll ll l ll l ll llll ll llll lllll l llll lll lll ll ll ll ll ll l ll ll ll l ll ll llll l ll lll lll l llll l lll lll ll ll llll ll l ll l ll l ll l ll llll l lll lll lll lll lllll l l lll l lll ll llll l lllll ll ll l ll lll lll lll ll lll l lll l ll lllll l lll l l l −4−2024 −4 −2 0 2 4 Second dim. of X S e c ond d i m . o f D i v i de and C onque r M D S ll ll ll l lll llll ll lll ll l ll l l llll lll l lll l ll lll ll l lll ll llll l llll ll lll l ll lll lll l l lll l l llll ll lllll ll lll ll ll ll lll l ll l lll l l ll ll ll llll llll l llll l ll lllll l lllll lll lll lll l ll ll ll l l llll lll ll lll lll ll l lll l llll ll llllll lll lllll ll ll ll llll llllll ll ll l ll ll l ll l ll ll ll ll ll lll lll llll ll ll lll l ll lll ll ll l lll llll lll l lll ll ll ll ll ll ll l ll lll l ll lll llllll ll l llll l l lll l lll llll lll ll ll ll l lll llll l l ll lll lll ll ll llll ll lll l ll ll llll ll lll ll l l ll lll l lll l lll l ll ll ll ll l ll ll l ll ll ll ll l ll llllll ll ll ll l llll ll lll lll l l l ll ll ll lll ll lll lll l llll l llll ll lll ll lll lll ll l l ll l ll ll lll l lll l l ll lll l lll ll ll lll ll l ll ll l ll l l ll l lll lll lll ll l ll llll ll l lll lll ll ll ll ll llll lll lll ll ll lll ll l lll l lll ll lllll ll lll l l lll ll lll l ll l ll ll ll l llll ll lllll ll l lll ll lll llll lll llll llll l llll lll lll l lll ll ll lllll l l l ll lll l ll l ll l ll ll ll llll lllllll l ll ll ll llll l l ll l ll l ll ll l lll lllll lll ll llllll ll ll ll ll ll ll ll llll lll lll ll ll l lll l ll llll l ll lll lll lll l ll l ll lll ll l ll llll l ll l lll l lll ll ll llll ll l llll ll l l lll ll l llllll lll lll l ll ll ll lll ll ll ll l l ll ll l lll l ll ll ll lllllll l llll ll llll l ll ll ll lll ll l ll ll ll l ll l l lll lll l lll lll lll ll llll −4−2024 −4 −2 0 2 4 Third dim. of X T h i r d d i m . o f D i v i de and C onque r M D S Figure 3: Dimensions 1, 2 and 3 X against dimensions 1, 2 and 3 of MDS
Div . In red,the line x = y . X X X MDS
Div Div Div -0.04 0.02 1Table 1: Cross-correlation of X and MDS
Div . During the process of gathering information about work previously done aroundMDS with large datasets, we found that Tynia, Jinze, Leonard, and Wei (2006) alreadyproposed an algorithm, they named it
Fast Multidimensional Scaling . • Divide X into X , . . . , X p . Each of them of the same dimensions. • Obtain MDS configuration in the following way: – If X i has such dimensions that allow to run classical MDS in a reasonableamount of time, compute MDS for each X i : MDS , . . . , MDS p . – Otherwise, for each X i call recursively Fast MDS (at the end of this sectionwe explain detailedly how the stop condition for the recursion is found). • These individuals MDS solutions are stitched together by sampling s points (rows)from each submatrix X i and putting them into an alignment matrix M align of size sp × sp . In principle, s should be at least 1 plus the estimated dimensionality ofthe dataset. In practice, it is oversampled by a factor of 2 or more. • MDS is run on M align . After this, it is obtained mMDS . Given a sampled point,there are two solutions of MDS: one from X i and another one from M align . • The next step is to compute Procrustes transformation to line these two sets ofsolutions up in a common coordinate system: mMDS i = s i dMDS i T i + i (cid:48) where: 9 X X MDS
F ast F ast F ast X and MDS
Fast . – dMDS i is MDS i but taking into account just the subset of the sample pointsthat belongs to partition i . – mMDS i is mMDS but taking into account just the subset of the samplepoints that belongs to partition i . • After doing the previous part, we obtain a set of p Procrustes parameters ( s i , T i , t i ).So, the next step is to apply this set of parameters to each MDS i , i.e, MDS i a := s i MDS i T i + (cid:48) i . • The last step is to join
MDS a , . . . , MDS p a all together, i.e, MDS X := MDS a ∪ · · · ∪ MDS p a . The process of splitting the dataset is applied recursively until the size of X i isoptimal to run MDS on. The stop condition is found as follows: let l × l be the size ofthe largest matrix that allows MDS to be executed efficiently, i.e, in a reasonable amountof time.There are two issues that impact the performance of the algorithm: the size ofeach submatrix after subdivision and the number of submatrices, p , that are stitchedtogether at each step. Ideally, the size of each submatrix after division should be aslarge as possible without exceeding l . By the same token, the size of M align should bebounded by l . The number of submatrices to be stitched together, p , should be thelargest number such that sp ≤ l .As in the previous algorithm, if the number of rows of the original dataset, X , issuch that allows to run classical MDS over X without partitioning it, then p = 1 and Fast MDS is just the classical MDS.
As we have done in Section 2.2.2, we present some visual results of this algorithm.The data used are the same as in Section 2.2.2. We call
MDS
Fast the result that providesthe previous algorithm.Figure 4 shows that, for this particular case, the algorithm captures quite well thedimensions of the original data, providing a good performance. In addition, dimension i of MDS
Fast fits perfectly the same dimensions i of X and just this one, as we can seein Table 2. Gower interpolation formula (see the Appendix of Gower and Hand 1995) allows toadd a new set of points to a given MDS configuration. Given a matrix X n × p , a MDS10 l ll ll ll ll lll ll l ll ll ll ll llll ll l ll l ll l lll lllll ll lll llllll ll lll l llll ll ll ll l lllll l ll lll lll ll lllllll lll ll lll lllll ll ll ll ll ll l l llll ll ll l lll l ll l lll ll ll lll llll lll l ll llll lll l lllllll lll ll l l lll ll lll ll lll ll lllll ll llll lll ll llll lll l l llll l llll l ll ll ll lll lll llllll ll ll lll l ll ll ll ll lll ll ll ll l lll ll l lll llll ll l ll ll ll ll ll llll l ll l ll l lll ll l ll ll l l lll llll ll l lll l ll ll llll l lll llll lll ll l ll ll ll ll ll ll llll lll l ll ll l l lllll l llll l llllll l lll l ll lllll lll lllll lll llll ll lll l lllll ll lll ll ll lll ll lllll ll ll ll ll l lll ll ll llllll ll l llll l ll l ll ll llll ll ll l lll l lll ll ll l ll ll llll ll l lll lll l ll lll lll lll lll l ll ll ll ll ll lllll llll lll lll ll lll l ll ll ll ll l ll ll l lll llll ll l llll ll l l lll l l ll lll ll lll ll ll ll l ll l lll lll ll ll ll l ll ll l ll ll llll l lll ll ll l llll l ll ll ll llll l lllll ll l ll ll ll ll ll ll ll ll ll ll lll ll ll lll ll ll ll ll lll ll ll ll lll ll ll ll ll lll ll lll ll llll ll ll ll lll lll lllll l l l ll lll ll l ll ll ll ll ll l lll ll ll ll l ll l ll llll ll l ll llllll l lll l l ll lll l l ll ll ll ll l ll ll ll l ll ll llll l ll lll lll l llll ll ll lll ll ll llll ll l ll l ll l ll l ll llll llll lll llll ll ll lll l l lll llll ll llll l llll l ll ll l ll lll l ll lll ll lll l ll ll ll ll llll lll l ll −4−2024 −4 −2 0 2 4 First dim. of X F i r s t d i m . o f F a s t M D S ll ll lll ll l llll ll lll ll llll l lll l lll llll llll llll ll llll l lll llll l ll lll llll ll lll l l llll lll ll l ll llll ll ll l ll lllll lll l ll lll l lllll l ll lll l ll ll ll ll l ll lllll lll lll ll llll lll l ll ll ll l l llll lll ll lll ll l ll lll l l llll l lll llll lll lllll ll ll llll ll ll ll ll ll ll llllll ll lll lll l ll ll lll l ll l ll l ll ll lll l ll ll l ll ll l lll lll llll ll ll lll l ll ll ll lll ll l lll ll lll l lll ll ll l ll ll lllll l lll l l ll lll ll llll l lll l l ll l l lll ll lllll ll llll ll lll lll ll llll ll l ll ll ll ll lll llll l lll l ll ll ll lll ll ll l lll l ll ll l ll ll llll ll l lll l l lllll l ll lll ll lll ll ll l ll ll ll l lll lllll l llll ll lll ll lll lll ll lllll ll lllll llll ll l l llll ll l llll lll lll l l ll l ll l l ll l lll lll lll ll lll lll l ll l lll lllll l lll ll ll ll lll ll l lll l lll ll l l ll l lll ll lll lll l lll l l lll ll lll lll lllll ll l ll ll ll ll lll ll llll ll lll lll lll ll lll lll l l llll lll lll l lll llll l l lll l l l ll llll ll lll l ll ll lll lll l lllll l lll lll l l ll ll l ll l ll lll ll l ll ll l lll lll llll ll ll ll l lllll ll ll l l lll l llll lll lll l lll l llll ll l llll llll lll l ll l ll lll ll l ll llll l ll l lll ll ll ll ll l l ll ll ll lll l ll l lll lll lllll l lll l lll ll ll ll lll llllll l l lllll lll l ll ll ll llllll l lllll ll ll ll l ll ll ll ll lll l ll lllll llll lll llll lll l ll lll l ll lll −4−2024 −4 −2 0 2 4 Second dim. of X S e c ond d i m . o f F a s t M D S lllll ll lll ll lll ll lllll l l ll ll ll lll l l lll ll l ll lll lll ll ll lll lll l lll ll lll ll ll lll ll ll ll llll ll ll ll lll lll lll l ll lll l ll llllll llll ll ll l llll l l l lll lll ll llll lll ll llll ll ll ll ll l l ll lll llll ll ll llll ll lll lll lll ll ll lll l l ll llll ll l lll l ll lll ll ll llll lll l ll ll lll l llll ll lll ll ll ll l lll ll llll ll l ll l lllllll lll ll ll lll l lll lllll ll lll ll lll l ll lll ll llll lllll lllll l l llll ll l llll ll lll ll lll llllll l l l ll lll l ll l ll llll l ll lll l llll llll l l ll ll ll llllll l lll ll llllll l ll ll l lll ll l l lll llll l lll l lll llll lll ll ll l lll l ll l lll llll ll l ll lll ll l l l l lll lll lll lll ll l ll l l lll lll l ll ll lll lll l llll lll lll lllll l llllll l l ll ll llll l ll lll ll l ll l lll l ll ll ll l ll ll l l lll lll ll ll llll llllll ll ll lll ll l lll l lll ll l l lll l l lll l llll ll llll ll l ll ll ll ll lll l lll ll l ll l ll l lllll llll lll ll ll ll ll llll l ll lll l lll l ll ll lll ll ll l ll lll l ll lll ll l lll l ll ll lll l lll l llll ll llll ll l l lll ll llllll l lll ll llllll ll l llll ll ll ll ll l lll l lll lll ll ll l ll l lll lll lll lll ll ll ll l llllll l ll lll l ll ll llll ll ll llll l llll l lllll ll lll ll ll ll ll lll llll lll ll l ll lllll lll lll ll l l lll l ll lll l ll lll l ll l llll ll l l lll ll llll l lll ll ll ll lll lllll llll ll lllllll lll lll l lllll llll −4−2024 −4 −2 0 2 4 Third dim. of X T h i r d d i m . o f F a s t M D S Figure 4: Dimensions 1, 2 and 3 of X against dimensions 1, 2 and 3 of MDS
Fast . Inred, the line x = y .configuration for this matrix of dimension n × c and a matrix X new m × p , one wantsto add these new m rows to the existing MDS configuration. So, after adding this newrows, the MDS configuration will have n + m rows and c columns. We briefly summarisehow to do so: • Obtain J = I n − n (cid:48) , where I n is the identity matrix n × n . • Given the distance matrix D of the rows of X , calculate ∆ = ( δ ij ). Note that ∆ is of size n × n . • Calculate G = − J∆J (cid:48) . • Let g be the diagonal of G , i.e, g = diag( G ). We treat g as a vector. • Let A be the distance matrix between the rows of X and the rows of X new . A has dimensions m × n . Let A be the matrix of the square elements of A , i.e, A = ( a ij ). • Let M and S be the MDS for X and the variance-covariance matrix of the c columns of M . • The interpolated coordinates for the new m observations are given by12 n ( (cid:48) − A ) MS − . (12)The resulting MDS for the m observations of X new is in the same coordinate systemas M . So, here it is not needed to do any Procrustes transformation. • Divide X into p partitions X , . . . , X p of the same dimensions. • If p = 1, classical MDS is run over X . • Otherwise, we use the procedure explained above, being X := X and X new := X k , k ∈ { , . . . , p } . • Calculate J , ∆ , G , g , A , M and S according to the above formulas.11 l l llll ll ll lll ll lll ll llll ll ll ll ll ll ll ll ll llll ll lllll lll llll l ll ll l llll ll lll ll llll ll l lll ll llll l lll l ll lll ll l lll ll ll l l ll ll l l ll llll ll llll lll lllllll lll lllll lll l lll l l l ll llll ll ll lll l llll ll l ll lll ll llll l ll l llll ll l lll llll ll llll ll ll ll ll ll llll ll ll ll lll lll l ll lllll l lll ll l lllll ll lll ll l llll lllll ll lll l ll lll ll l ll l l l ll ll ll l lll llllll lll ll ll lll ll llll lllll l l ll lll l lllll l llll l ll l ll ll llll ll ll l l llll l ll l ll l l ll l l l ll l ll llll l l llllllll lll l llll ll lll ll ll llll l ll lll ll l l lll l lll lllll lllll l lll ll ll lll ll lll ll ll ll l lll ll l ll l ll ll lll ll ll lllllll l lll ll ll llll llll l lll lll ll ll l llll lll llll lll l ll ll ll l lll lll lll ll lll l ll l ll l lll ll llll ll lll ll l lll lll llll ll l l ll ll lll llll llll l ll l ll ll lllll lll l lll ll ll ll l l ll ll ll ll l ll l lll lll l lll l ll lll ll ll llll lll l ll lll llll ll ll ll ll l llll l lll l lll ll lll l lll l lll lll l l llll l lll l ll ll ll ll lll l ll llll l l ll ll l lll lll llll ll llll l llll l lllll llll l l lll lll ll ll lll l llll ll lllll ll lllllll l llll ll ll lll ll ll ll lll ll ll ll lllll l llll ll ll ll ll l l ll lll lllll ll lll lll ll llll lll lll lll lllll ll lll l ll llll l llll l l ll l ll ll l l lllll ll ll ll lll l l l lll ll llll ll l ll ll lll l lllll l ll lll −4−2024 −4 −2 0 2 4 First dim. of X F i r s t d i m . o f M D S ba s ed on G o w e r i n t e r p . l l ll ll lllll llll l ll ll l l ll lllll ll ll l ll l lll ll ll l ll lllll lll lll lll l llll lllll ll l llll ll l lll llll l l ll llll lll lll ll lllll ll l l llll ll l l llll ll lll lll lll llll l l ll lll lll l ll l l ll llll ll l l l llll ll ll l ll l l lll l llllll ll lll ll llll lll lll ll ll lll l llll llll l l ll ll lll ll ll lll l ll lll lll ll ll lll lll ll ll ll lll llll ll l lll lll lll l llll l l ll ll ll ll ll l lll l lllll llll ll l ll ll l l lll llll ll l lll l ll ll ll l ll lll llll ll lll l l l llllll ll ll llll lll l ll ll l llll lll l ll ll llll ll llll l ll lllll lll lllll lll llll l l l ll l ll l llll lll ll ll ll ll l lll ll l lll llll l lll ll ll llllll lll lll l l ll lll ll lll l ll lll lll l lll ll ll l ll ll llll ll l lll ll l lll l ll lll lll lll l llll ll ll ll ll lll llll llll ll l l lll lll ll lllll ll ll llll llll ll ll lll lll l lll l l ll lll l l lll l lll ll lllllll lll l ll llll ll ll lll ll lll lllll ll ll l lllll ll ll ll llll l l llll ll l ll ll ll ll l lll ll ll ll ll lll ll ll l ll ll ll ll l llll llll ll ll l llllll ll lll ll lll ll llll ll l l ll lll lll lllll l l l ll l ll ll l ll ll ll llll l lll ll ll ll l ll l ll llll ll llll lllll l llll lll lll ll ll ll ll ll l ll l l ll l ll ll llll l ll lll lll l llll llll lll ll ll llll ll l ll l ll l ll l ll ll ll l lll lll llll ll ll lll l l lll l lll ll ll ll l lllll ll ll l ll lll lll lll ll lll l lll l ll lllll l lll l ll −4−2024 −4 −2 0 2 4 Second dim. of X S e c ond d i m . o f M D S ba s ed on G o w e r i n t e r p . ll ll ll l lll ll ll ll lll lll ll l l ll ll lll l llll ll lll ll l lll ll ll lll lll l ll lll l ll lll llll llll l l llll ll lllll ll lll ll l l ll lll l ll l lll l l ll ll ll llll llll l llll l ll lllll l lllll lll l ll ll ll ll ll ll ll lllll ll ll llllll lll lll l llll ll lll l ll lll ll lll llll ll lll l lll lll l lll l ll ll l ll l llll ll ll ll l ll lll lll lllll lll l ll lllllll l lll ll ll lll l lll ll lll l ll ll ll lll lll lll ll l lll lll ll l llll l l lll l lll llll lllll ll ll l lll llll l l ll lll l ll ll llllll ll lll l ll ll llll ll lllll l l ll lll l lll llll l ll ll ll ll l llll l ll llll ll l ll lll lll ll ll ll l ll ll ll lll ll l l ll ll ll ll lll ll lll lll l lll l l llll ll lll lllll lll ll l l ll lll ll lll l lll l l ll lll l lll l l ll lllll l ll ll lll l l ll l ll ll ll l l lll l ll ll ll lll lll l llll ll ll ll llll lll lll ll ll lll ll l lll l lll ll lllll l l lll l l lll ll llll ll l ll ll ll ll lll ll ll lll ll l lll ll lll llll lll ll ll llll l llll lll lll l lll ll ll lllll l l l ll lll l ll l ll ll l ll l l llll lllllll l ll ll ll llll l l ll l ll lll lll ll l lll ll llllllll lll ll ll ll ll ll ll ll llll lll lll l l ll l lll lll llll l lll ll lll lll l ll l ll lll lll ll llll llll lll l l ll ll l l llll ll l ll ll ll ll lll ll l llllll lll ll llll ll ll lll ll ll lll l ll lll lll lll ll ll llll ll l l llll ll llll l ll ll ll lll ll l llll lllll l l llllll lll lll l lllll llll −4−2024 −4 −2 0 2 4 Third dim. of X T h i r d d i m . o f M D S ba s ed on G o w e r i n t e r p . Figure 5: Dimensions 1, 2 and 3 of X against dimensions 1, 2 and 3 of MDS
Gower . Inred, the line x = y . X X X MDS
Gower Gower Gower -0.04 -0.03 1Table 3: Cross-correlation of X and MDS
Gower . • Obtain MDS for the first partition X . • Given a partition 1 < k ≤ p , do the following steps to get the related MDS: – Calculate the distance matrix between the rows of X and X k and calculatethe square of each element of this matrix. Let A be this matrix (same asabove). – Use Gower interpolation formula (12) to obtain MDS for partition k . – Accumulate this solution.As in the previous two algorithms, there is a key parameter to choose: p , the numberof partitions. For this algorithm, p is set in the following way. Let l × l be the size ofthe largest distance matrix that a computer can calculate efficiently, i.e, in a reasonableamount of time. The value of p is set as n big /l , where n big is the total number ofindividuals in the data set. We repeat the same as in Section 2.2.2. Figure 5 and Table 3 show that the algorithm,for this particular case, captures quite well the dimensions of the original data, providinga good performance.
The three algorithms have the same type of output. It consists on a list of twoparameters.The first parameter is the MDS configuration calculated by the algorithm. It is amatrix of n rows and c columns, where n is the number of rows of the input data and c is the number of dimensions the user has required.The second parameter is a list of eigenvalues. This list is built as follows:12 All the algorithms divide the initial data into a set of p partitions. • Given a partition i , a distance matrix of dimensions m i × m i is calculated: D i . • Over D i a singular value decomposition is performed, providing a list of length m i that contains all the eigenvalues of the previous decomposition: list i . • Let norm eigenvalues i be list i /m i , i.e, each eigenvalue is divided by the number ofrows of D i . • The algorithms return norm eigenvalues ∪ · · · ∪ norm eigenvalues p . We refer tothis union as the normalized eigenvalues . The three previous algorithms share the same goal: obtaining a MDS configurationfor a given large dataset. However, there are some differences between the approachesthat impact the performance of the algorithms. The main differences between them are: • Divide and Conquer MDS uses a guide (the first subset, X ) to align the solutionsas well as it uses the whole partition X i to find Procrustes parameters. However, Fast MDS does not use a guide an it uses a set of subsamples to find Procrustesparameters. • Fast MDS is based on recursive programming. It divides until a manageable di-mensionality is found. However,
Divide and Conquer MDS finds the number ofpartitions without applying recursive programming. • MDS based on Gower interpolation does not need any Procrustes transformation.The fact that we found three algorithms to compute MDS possesses some questionsthat need to be answered: • Are these algorithms able to capture the data dimensionality as good as classicalMDS does? • Which is the fastest method? • Can they deal with large datasets in a reasonable amount of time? • How are they performing when dealing with large data sets?All these questions are answered in Section 3.
Given the three algorithms, we would like to explore their performance. There aretwo issues to study: • Performance in terms of results quality: are they able to capture the right datadimensionality? 13
Performance in terms of time: are they “ fast” enough? Which one is the fastest?To test the algorithms under different conditions, a simulation study has been carriedout. The scenarios are obtained as combinations of: • Sample sizes : we use different sample sizes, combining small datasets and largeones. A total of six sample sizes are used, which are: – Small sample sizes: 10 , · , · and 10 . – Large sample sizes: 10 and 10 . • Data dimensions : we generate a matrix with two different number of columns: 10and 100. • Main dimensions : given X n × k , where n ∈ { , · , · , , , } and k ∈ { , } , it is postmultiplied by a diagonal matrix that contains k values, λ , . . . , λ k . The first values are much higher than the rest. The idea of this isto see if the algorithms are able to capture the main dimensions of the originaldataset, i.e, the columns with the highest variance. We set 5 combinations for thisvariable, which are: – All the columns with the same values of λ : λ = · · · = λ k = 1. – One main dimension with λ = 15 and λ = · · · = λ k = 1. – Two main dimensions of the same value λ : λ = λ = 15 and λ = · · · = λ k = 1. – Two main dimensions of different values λ : λ = 15, λ = 10 and λ = · · · = λ k = 1. – Four main dimensions of the same value λ : λ = λ = λ = λ = 15 and λ = · · · = λ k = 1. • As a probabilistic model, we use a Normal distribution with µ = 0 and σ = 1. Withthis distribution, we generate a matrix of n observations and k columns, being the k columns independent. After generating the dataset X , it is postmultiplied bythe diagonal matrix that contains the values of λ ’s.There is a total of 60 scenarios to simulate. Given a scenario, it is replicated 100times. For every simulation, it is generated a dataset (according to the scenario), andall the algorithms are run using this dataset. So, a total of 6000 simulations are carriedout.Tables 4 and 5 show the configuration of each scenario, being Table 4 the configu-rations for small sample sizes and Table 5 the configurations for for large sample sizes.Given a scenario, scenario id identifies it. We refer to a scenario by its scenario id .Note that scenarios 1, 2, 11, 12, 21, 22, 31, 32, 41, 42, 51, 52 are pure noise. We referto them as noisy scenarios .Given a scenario, the steps done to calculate and to store all the data needed are:1. Generate the dataset X according to the scenario.2. For each algorithm, we do the following steps:14cenario id sample size n dimensions value primary dimensions1 10
10 -2 10
100 -3 10
10 154 10
100 155 10
10 15, 156 10
100 15, 157 10
10 15, 108 10
100 15, 109 10
10 15, 15, 15, 1510 10
100 15, 15, 15, 1511 3 ·
10 -12 3 ·
100 -13 3 ·
10 1514 3 ·
100 1515 3 ·
10 15, 1516 3 ·
100 15, 1517 3 ·
10 15, 1018 3 ·
100 15, 1019 3 ·
10 15, 15, 15, 1520 3 ·
100 15, 15, 15, 1521 5 ·
10 -22 5 ·
100 -23 5 ·
10 1524 5 ·
100 1525 5 ·
10 15, 1526 5 ·
100 15, 1527 5 ·
10 15, 1028 5 ·
100 15, 1029 5 ·
10 15, 15, 15, 1530 5 ·
100 15, 15, 15, 1531 10
10 -32 10
100 -33 10
10 1534 10
100 1535 10
10 15, 1536 10
100 15, 1537 10
10 15, 1038 10
100 15, 1039 10
10 15, 15, 15, 1540 10
100 15, 15, 15, 15Table 4: Scenarios simulated for small sample sizes.15cenario id sample size n dimensions value primary dimensions41 10
10 -42 10
100 -43 10
10 1544 10
100 1545 10
10 15, 1546 10
100 15, 1547 10
10 15, 1048 10
100 15, 1049 10
10 15, 15, 15, 1550 10
100 15, 15, 15, 1551 10
10 -52 10
100 -53 10
10 1554 10
100 1555 10
10 15, 1556 10
100 15, 1557 10
10 15, 1058 10
100 15, 1059 10
10 15, 15, 15, 1560 10
100 15, 15, 15, 15Table 5: Scenarios simulated for large sample sizes.(a) Run the algorithm and get MDS configuration for the algorithm (
MDS alg ).(b) Get the elapsed time to compute MDS configuration and store it.(c) Get normalized eigenvalues and store them.(d) Align
MDS alg and X using Procrustes.(e) Get the correlation coefficients between the main dimensions of MDS alg and X and store them.There are some important details that affect the results of the simulations, whichare: • When running the algorithms, we ask for as many columns as the original datahas, i.e, k . Therefore, the low-dimensional space has the same dimension as theoriginal dataset. • For the normalized eigenvalues , we just store 6 eigenvalues instead of the full list ofeigenvalues (otherwise we would store n eigenvalues, which is memory consumingand we do not need all of them, just the first ones). • For Procrustes we dot not allow dilations, otherwise distance could not be pre-served. In addition, we do not use all the columns to do the alignment, we selectthe main dimensions. If there is not any main dimension, i.e it is one of the noisyscenarios , we just select 4 columns. • To avoid memory problems with the alignment when n is greater or equal to 10 ,Procrustes is done in the following way:16. Create p partitions of X and the result of a given MDS algorithm ( MDS alg ).Both sets of partitions contain exactly the same observations.2. For each partition get Procrustes parameters without dilations.3. Accumulate the parameters iteration after iteration. So, at the end, we obtain R = (cid:80) pi =1 R i /p and t = (cid:80) pi =1 t i /p .4. Apply these parameters to MDS alg so that X and MDS alg are in the samecoordinate system and they can be compared, i.e X Procrustes = XR + (cid:48) . Note that the original dataset, X , is always available and it is already the MDS config-uration, since we simulate independent columns with mean value equals to 0. Therefore,even though n is so large that MDS can not be calculated using classical methods, wealways have the solution that we would obtain if running classical MDS were possible.Therefore, we can always compare the MDS provided by the algorithms ( MDS alg ) withthe original dataset ( X ).In order to test the results quality of the algorithms as well as the time needed tocompute the MDS configuration, some metrics are calculated. These metrics are thefollowing ones: • Performance of results quality: two metrics are calculated, which are: – Correlation between the main dimensions of the data and the main dimensionsafter applying the algorithms. We get the diagonal of the correlation matrix,i.e, the correlation between dimension i of the data and the dimension i ofthe algorithm. – Normalized eigenvalues as an approximation of the standard deviation of thevariables of X . • Elapsed Time to get the MDS configuration: given an algorithm, we compute andstore the elapsed time to get the corresponding MDS configuration.We do it in this way because we want to check some hypothesis. We expect the threealgorithms to behave “correctly”. By “correctly” we mean that the behavior should besimilar to those obtained as if classical MDS were run. Therefore, we expect that thecorrelation between the main dimensions of the data and the main dimensions of theMDS of each algorithm is close to 1.In addition, the variance of the original data should be captured. So, given thehighest normalized eigenvalues , we expect that its square root is approximately 15 or 10when the scenarios are not the noisy scenarios .For the time of the algorithms, we have done some analysis and it seems that
MDSbased on Gower interpolation is the fastest one. So, proper tests will be done.The algorithms have as input values a set of variables. The input matrix is alreadyexplained, but there is another parameter that has been used in the description of thealgorithms (see Section 2): l . The meaning of l is a little bit different in each algorithm,but for simplicity we set this value equals to 500. Fast MDS has an extra parameter: the amplification parameter. Tynia, Jinze,Leonard, and Wei (2006) use a value of 3 to test the algorithm. We use the samevalue. So, for each partition, it is taken 30 (when the original matrix has 10 columns)17r 300 (when the original matrix has 100 columns) points for every partition to build M align .Since a total of 6000 simulations are performed and some of them include largedatasets, we use Amazon Web Services (AWS) to carry out the simulations. 10 serversof the same type are used: c5n.4xlarge . It has 16 cores and 42 GB of RAM memory.
In this section we provide the simulations results for the correlation coefficients.Given a scenario and its dataset X of size n × k , the correlation matrix between the maindimensions of X and the main dimensions of MDS alg is computed. We are interestedin the diagonal of the correlation matrix, expecting that the values are close to 1.The length of the diagonal correlation matrix depends on the scenarios. It can be: •
1, when λ = 15 and λ = · · · = λ k = 1. •
2, when λ = λ = 15 and λ = · · · = λ k = 1 or λ = 15, λ = 10 and λ = · · · λ k =1. •
4, when λ = λ = λ = λ = 15 and λ = · · · = λ k = 1. •
0, when λ = · · · = λ k = 1, i.e, noisy scenarios .When the scenario is not a noisy one , the correlation obtained is greater than 0.9999,which indicates that the algorithms are able to capture the main dimensions of the data.To provide a visual result, Figure 6 shows a boxplot of the correlation coefficientsbetween the four main dimensions of the original dataset X and the four main dimensionsof the MDS obtained with Divide and Conquer MDS , MDS
Div , for scenario id = 60 . As we can see, all the values are close to 1. What shows Figure 6 happens for all the not-noisy scenarios and all the algorithms.Note that before calculating the correlation matrix, Procrustes transformation isperformed (dilations are not allowed) so that both coordinate systems are the same.For the noisy scenarios , Figure 7 shows the boxplot for scenario id = 51 and sce-nario id = 52 . These Figures have been generated using Divide and Conquer MDS .Since these scenarios are pure noise, the correlation is low.We do not observe any negative value since Procrustes alignment is done before calcu-lating the correlation coefficients. Therefore, X and MDS
Div have the same orientation.Again, what shows Figure 7 also happens for the remaining noisy scenarios and forthe other two algorithms. scenario id = 60 has the following configuration: n = 10 , 100 columns and 4 main dimensions with λ i = 15, i ∈ { , , , } . scenario id = 51 has the following configuration: n = 10 , 10 columns and without any maindimensions. scenario id = 52 has the following configuration: n = 10 , 100 columns and without any maindimensions. ll l c o rr e l a t i on variable dimension_1dimension_2dimension_3dimension_4 Figure 6: Boxplot for correlation coefficients between the main dimensions of X and themain dimensions of MDS
Div for scenario id = 60 ( n = 10 , 100 columns and 4 maindimensions with λ i = 15, i ∈ { , , , } ). The correlation is close to 1, indicating thatthe algorithm captures the main dimensions of the original dataset X . lll l l ll l ll
51 5250.6 50.8 51.0 51.2 51.4 51.6 51.8 52.0 52.2 52.40.000.250.500.751.000.000.250.500.751.00 scenario_id c o rr e l a t i on variable dimension_1dimension_2dimension_3dimension_4 Figure 7: Boxplot for correlation coefficients between X and MDS
Div for two differentscenarios: scenario id = 51 ( n = 10 , 10 columns and without any main dimensions)and scenario id = 52 ( n = 10 , 100 columns and without any main dimensions). Sinceall of them are noise, the correlation is low. scenario id is on the top of each boxplot.19 .3 Eigenvalues In this section we analyse how the eigenvalues approximate the standard deviationof the original variables.Since the original dataset, X , is postmultiplied by a diagonal matrix k × k thatcontains λ , . . . , λ k , then var( X i ) = λ i and sd( X i ) = λ i , where X i is the column i from X . MDS should be able to capture the variance of the main dimensions through theeigenvalues. Let φ , . . . , φ t be the normalized eigenvalues of the MDS configurationsuch that φ > φ > · · · > φ t . The first highest normalized eigenvalues have to verify (cid:112) φ j ≈ λ j .To check how the algorithms approximate the variance of the original data, we com-pute the bias and the Mean Square Error (MSE) for each scenario. We do not includethe noisy ones. Remember that bias and MSE are calculated as follows: (cid:100) bias = 1 m m (cid:88) i =1 (cid:112) φ ij − λ j = (cid:112) φ j − λ j , (cid:91) MSE = 1 m m (cid:88) i =1 ( λ j − (cid:112) φ ij ) . Since we perform 100 simulations, m = 100. Depending on the scenario, there canbe 1, 2 or 4 estimators.Table 6 shows the (cid:100) bias and the (cid:91) MSE for
Divide and Conquer MDS and scenarioswith one main dimension λ = 15. As we can see, the (cid:100) bias and (cid:91) MSE are “low” for thesescenarios and this algorithm.The remaining cases are in Pach´on-Garc´ıa (2019). As long as number of dimensionsincreases, the (cid:100) bias and (cid:91)
MSE do the same. However, they seem to be in an acceptablerange.Table 7 shows the (cid:100) bias and (cid:91)
MSE for
Fast MDS and scenarios with one main dimension λ = 15. For this algorithm, the (cid:91) MSE is higher than for
Divide and Conquer , even insome cases (for instance scenario id = 33 ) (cid:91) MSE is high compared with the previousalgorithm.One possible reason is the fact that the number of points used to do the Procrustesalignment are not enough. For this case, scenario id = 33, there are 30 points perpartition, being 17 partitions in total. So, the alignment is done with 510 points out of10 (i.e, 5% of the points). We have repeated the simulations for scenario id = 33 with60 points per partition and the (cid:91) MSE is lower (0.10).The remaining cases for
Fast MDS are in Pach´on-Garc´ıa (2019). Again, some of thescenarios have a high (cid:91)
MSE compared with
Divide and Conquer .Although
Fast MDS have higher (cid:91)
MSE values than the previous algorithm, we con-sider that they are acceptable.Table 8 shows the (cid:100) bias and (cid:91)
MSE for
MDS based on Gower interpolation and scenarioswith one main dimension λ = 15. The remaining ones are in Pach´on-Garc´ıa (2019). Thecomments given to Divide and Conquer MDS apply also to these cases. scenario id = 33 has the following configuration: n = 10 , 10 columns and 1 main dimension with λ = 15. √ φ (cid:91) bias (cid:92) MSE (cid:100) bias and (cid:91) MSE for scenarios with one main dimension λ = 15 for Divide and Conquer MDS .scenario id √ φ (cid:91) bias (cid:92) MSE (cid:100) bias and (cid:91) MSE for scenarios with one main dimension λ = 15 for Fast MDS . 21cenario id √ φ (cid:91) bias (cid:92) MSE (cid:100) bias and (cid:91) MSE for scenarios with one main dimension λ = 15 for MDS based on Gower interpolation .The algorithm that has the lowest error is the
Divide and Conquer MDS . Even thoughthe other ones have higher errors, especially
Fast MDS , we consider that they are goodenough.Note that, we do not consider the noisy scenarios , since all the directions have thesame variance.
In this section we investigate if there exists an algorithm that is faster than the otherones. Given the results of Table 9, it seems that
MDS based on Gower interpolation hasthe lowest time. Table 9 provides a rank between the methods: it seems that
MDS basedon Gower interpolation is the fastest one. In second position it would be
Fast MDS andfinally
Divide and Conquer .We do an
ANOVA test using three factors: the sample size (which has 6 levels),the number of dimensions (which has 2 levels) and the algorithm (which has 3 levels).Instead of using the elapsed time variable, we use its logarithm.Given the results of the
ANOVA test, which are in Table 10, we can reject the nullhypothesis that the three algorithms have the same expected elapsed time.We fit a linear regression with these variables and see the magnitude of the coeffi-cients. In addition, we plot the distribution of log(elapsed time) for all the algorithms.Table 11 contains the value of the coefficients. As long as either the sample size or thedata dimensions increase the coefficients do the same (and so the time needed). Lookingat the values for the algorithm variable, it seems that
MDS base on Gower interpolation is the fastest. On the other hand,
Divide and Conquer is the slowest one.Figure 8 and Figure 9 show the estimated density of the elapsed time for each al-gorithm and each scenario id for n = 10 . As we can see, MDS based on Gower in-terpolation is the fastest algorithm, especially when 100 dimensions are required. Theremaining figures, i.e n > , are in Pach´on-Garc´ıa (2019).One interesting thing that we can observe (see also additional figures and tables inPach´on-Garc´ıa 2019) is the fact that the elapsed time grows with sample size, especiallyfor Divide and Conquer MDS . However,
MDS based on Gower interpolation and
Fast
10 0.27 0.14 0.1010
100 0.78 0.69 0.283 ·
10 0.78 0.32 0.163 ·
100 2.50 3.14 0.525 ·
10 1.37 0.54 0.205 ·
100 4.25 5.69 0.8410
10 2.60 1.81 0.3110
100 8.85 11.79 1.3710
10 28.10 11.46 2.4410
100 106.30 116.46 18.0210
10 420.29 106.59 53.1510
100 2365.46 1070.19 813.15Table 9: Mean of elapsed time (in seconds) to compute each algorithm.Df Sum Sq Mean Sq F value Pr( > F)algorithm 2 9283.73 4641.86 32143.99 < e − < e −
16n dimensions 1 12868.36 12868.36 89110.86 < e − ANOVA test for differences in log(elapsed time) using algorithm,sample size and num. dimensions as factors.Estimate Std. Error t value Pr( > | t | )(Intercept) -1.4058 0.0085 -165.44 < e − < e − < e − < e − < e − < e − < e − < e −
16n dimensions100 1.6910 0.0057 298.51 < e −
51 2 30.2 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.40.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.40102030051015010203001020300102030 elapsed time in seconds e s t i m a t ed den s i t y algorithm divide_conquerfastgower Figure 8: Estimated density of elapsed time (in sec.) for each algorithm and the firstfive scenario id of n = 10 . scenario id is on the top of each plot. MDS provide really good computation times even for large sample sizes, being
MDSbased on Gower interpolation the fastest one. So, we can consider both algorithmsefficient, since they are able to compute MDS in a reasonable amount of time.
The goal of this paper was to find an algorithm able to compute a MDS configurationwhen dealing with large datasets in an “efficient” way, i.e, such an algorithm should befast enough to get the configuration. Even though our first method,
Divide and ConquerMDS , is the slowest one, it is able to obtain a low-dimensional configuration. In addition,it has a really good property that the other two algorithms do not have: it is able tocapture the variance of the original data quite well.
Fast MDS provides an improvement of timing, being faster than
Divide and ConquerMDS . However, it is based on recursive programming. The problem with these kind ofalgorithms is that they can consume a lot of memory. In addition, a carefully selectionof the number of points to perform Procrustes alignment has to be done, since the (cid:91)
MSEcan be high, as it happen with our simulation study.A really good algorithm is
MDS based on Gower interpolation , since it provides aMDS configuration in a short amount of time with low errors and its implementationis easy. Apart from this, it does not need to do any Procrustes transformation, whichsave time and memory. Therefore, this is the algorithm of our choice to obtain MDSwith large datasets. Observe that this algorithm does essentially what classical Statisticsadvises: when your population is too large, take a sample of it.24
106 7 80.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.80.2 0.4 0.6 0.8 1.0 0.1 0.2 0.3 0.4 0.2 0.4 0.6 0.805101520010203040500510152025010200102030 elapsed time in seconds e s t i m a t ed den s i t y algorithm divide_conquerfastgower Figure 9: Estimated density of elapsed time (in sec.) for each algorithm and the lastfive scenario id of n = 10 . scenario id is on the top of each plot.Two kind of problems have been appeared during the development of this paper: • Computational problems: when working with large datasets, we suffered fromconsuming all available RAM of the servers. Especially when doing Procrustesfor aligning the original data and the MDS for n large. The solution to it was topartition the process into pieces that the servers could manage without consumingall available RAM. • Procrustes packages: even though R has a lot of packages that allows to computeProcrustes, they did not fulfilled our goals. The reasons are either because theoutput is not well specified or because some of the transformations (mainly dilationsor translations) were not included. We recommend to use MCMCpack package.Several points would deserve further research: • The algorithms that we have developed are implemented in R , which is a goodlanguage to do prototypes. However, this is not the best programming languageto be used in production. So, we recommend to implement them in a robustprogramming language such as C , C++ or Java . • Tuning of parameters: as we have seen in
Fast MDS , the (cid:91)
MSE depends on thenumber of points to perform Procrustes alignment. So, a (simulation) study onthis parameter should be done in order to obtain the “optimal” value for which the (cid:91)
MSE is as good as the one for
Divide and Conquer . • On the other hand, it might be that
Divide and Conquer uses more than necessarypoints to perform Procrustes alignment. Reducing the number of points would25reserve the (cid:100) bias and (cid:91)
MSE while improving the time needed to get the MDSconfiguration. • Given that the world is talking about Big Data, it is a good opportunity to challengethe algorithms and use them with Spark/Hadoop. • In the same line of Spark/Hadoop, an implementation based on map-reduce wouldspeed up the algorithms, reducing the time needed to get the low-dimensional con-figuration.
Acknowledgments
We would like to thank Roger Devesa, since he has helped us with reviewing recursiveprogramming and with implementing
Fast MDS . References
Borg, I. and P. Groenen (2005).
Modern Multidimensional Scaling: Theory and Appli-cations . Springer.Gower, J. C. and D. J. Hand (1995).
Biplots , Volume 54. CRC Press.Pach´on-Garc´ıa, C. (2019).
Multidimensional scaling for Big Data . Ph. D. thesis, UPC,Facultat de Matem`atiques i Estad´ıstica, Departament d’Estad´ıstica i Investigaci´o Op-erativa.Pe˜na, D. (2002).