An optimization parameter for seriation of noisy data
aa r X i v : . [ m a t h . C O ] F e b AN OPTIMIZATION PARAMETER FOR SERIATION OF NOISYDATA ∗ MAHYA GHANDEHARI † AND
JEANNETTE JANSSEN ‡ Abstract.
A square symmetric matrix is a Robinson similarity matrix if entries in its rows andcolumns are non-decreasing when moving towards the diagonal. A Robinson similarity matrix can beviewed as the affinity matrix between objects arranged in linear order, where objects closer togetherhave higher affinity. We define a new parameter, Γ , which measures how badly a given matrix failsto be Robinson similarity. Namely, a matrix is Robinson similarity precisely when its Γ attains zero,and a matrix with small Γ is close (in the normalized ℓ -norm) to a Robinson similarity matrix.Moreover, both Γ and the Robinson similarity approximation can be computed in polynomial time.Thus, our parameter recognizes Robinson similarity matrices which are perturbed by noise, and cantherefore be a useful tool in the problem of seriation of noisy data. Key words.
Robinson similarity matrices, Robinsonian matrices, unit interval graphs, seriation,linear embeddings of graphs
AMS subject classifications.
1. Introduction.
Many real-life networks, such as online social networks, bio-logical networks and neural networks, are manifestations of an underlying (hidden)spatial reality. For example, members of a social network can be identified with nodesplaced in a metric space, whose coordinates represent the interests, backgrounds, andother significant features of the users. The formation of the network is then modeledas a stochastic process, where the probability of a link occurring between two nodesdecreases as their metric distance increases. A fundamental and challenging problemin the analysis of a social network (or any other spatial network) is to uncover its“hidden spatial layout”, i.e. to identify the metric space representation of the net-work. Analysis and visualization of data becomes considerably more tractable whenthe dataset is presented according to its spatial reality.The classical seriation problem, introduced by Robinson in [22], can be viewedas the special case of the spatial layout problem, restricted to one dimension. Theobjective of the seriation problem is to order a set of items so that similar items areplaced closer to each other. The seriation question translates in a natural way into aquestion regarding symmetric matrices. A symmetric matrix is a
Robinson similarity matrix, or
Robinson matrix for short, if its entries are non-decreasing when movingtowards the main diagonal in each row or column. A symmetric matrix A is said tobe Robinsonian , if it becomes a Robinson matrix after simultaneous application of apermutation π to its rows and columns. In that case, the permutation π is called a Robinson ordering of A . If the entries of the symmetric matrix A = [ A i,j ] representsimilarity of items i and j , then the Robinson ordering represents a linear arrangementof the items so that similar items are placed closer together.The problem of recognizing Robinsonian matrices, and finding their Robinsonorderings, can be solved in polynomial time. See [19] for the first polynomial timealgorithm for this problem, and [24, 20, 15, 14] for more recent efficient algorithms.Most of these algorithms are based on a similar principle; namely the connection ∗ Submitted to the editors DATE.
Funding:
The authors gratefully acknowledge supports from UDRF and NSERC. † Department of Mathematical Sciences, University of Delaware, Newark, DE ([email protected]). ‡ Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada ([email protected]). 1
M. GHANDEHARI AND J. JANSSEN between Robinsonian similarity matrices and unit interval graphs ([15, 14]) or interval(hyper) graphs ([19, 24, 20]). A spectral algorithm based on reordering the matrixaccording to the components of the second eigenvector of the Laplacian, or the Fiedlervector, was given in [1], and was then applied to the ranking problem in [11].The seriation problem has diverse and significant applications, from its origin inarcheological studies to recent applications to ecology and sociology. For a histori-cal overview of the problem and its diverse applications, see [16]. In most of theseapplications, it is natural to expect the data to be noisy. In that case, the optimalreordering of a data-derived matrix may not be itself a Robinson matrix, but it willbe close to one. The question then becomes, to which extent a given matrix resemblesa Robinson matrix. All of the algorithms mentioned in the previous paragraph onlyapply to noise-free Robinsonian similarity matrices.In the presence of noise, the goal of the seriation problem is to find an “almostRobinson” ordering of a given matrix, i.e. an ordering for which the reordered matrixis closest to being Robinson. This question turned out to be much more challengingthan the error-free analogue. In fact, it is shown in [4] that the problem of findinga reordering and a Robinson matrix which is the best ℓ ∞ -approximation is NP-hard.In [5] a factor 16 approximation algorithm is given for the case of ℓ ∞ . NP-hardnessfor a number of related problems is established in [2], where approximation by ℓ p distance is considered. Specifically, it is shown that for an integer p , the problemof finding proper strong Robinson relations within specified ℓ p distances of a givenmatrix is NP-complete. A proper strong Robinson relation corresponds to an appro-priate relabelling of the original matrix together with a Robinson matrix with certainadditional, stronger properties. In [10] (together with the references therein) a statis-tical approach to the problem of seriation with noise is developed, where the error ofRobinson approximation is measured by the Frobenius norm.If the appropriate labelling is given, then the problem becomes more tractable.Now the problem is that of finding a Robinson matrix closest to a given matrix. Forthe ℓ ∞ -norm, it is known that this problem can be solved in polynomial time sincean explicit closed form for the optimal solution can be easily given (cf. [23]). Theproblem of finding the best ℓ -approximation can be formulated as a linear program:Minimize the linear function k A − R k subject to the constraint that R is Robinsonsimilarity. The constraint can be expressed with O ( n ) inequalities, and thus theproblem can be solved in polynomial time.In this article, we develop new methods and algorithms which can be used forseriation of noisy data. Our focus here is on formalizing the notion of a matrix being“almost Robinson.” To do so, we introduce a parameter, which we call Γ , that mea-sures how much the local structure of a matrix resembles being a Robinson similarity.Namely, Γ sums the magnitude of local violations to the Robinson similarity prop-erty, and achieves the value 0 precisely when the matrix is Robinson. This parameteris a natural tool for the seriation problem, and has been used in practice as a heuristicto measure the amount of deviation from a Robinson form (see [3, 13]). Moreover,Γ is simple to formulate, and can be computed in linear time. The main goal of thispaper is to show that if the number and magnitude of local violations is small, thenthe matrix is indeed close, in the sense of ℓ -norm, to a Robinson matrix. Precisely,we prove in Theorem 2.2 that for every given matrix A , there exists a Robinson matrix R so that k A − R k ≤ ( A ) / . In addition, we give a polynomial time algorithmto compute the Robinson approximation R which fulfills the above inequality.A proposed application of this work is a novel scheme for treating the seriationproblem of noisy data, in which the selection of the best permutation is guided by N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA . The traditional formulation is: Given a matrix A , find a permutation π and a Robinson matrix R so that k A π − R k p is minimized. (Here A π refers to thematrix obtained by permuting rows and columns of A according to π .) This approachrequires the simultaneous optimization of both the matrix R and the permutation π .Using the results in this paper, we can instead reduce the noisy seriation problem tothe following problem: Given a symmetric matrix A , find a permutation of its rows and columns so that Γ ( A ) is minimized. Once such a permutation is found, our algorithm can be used to compute theappropriate Robinson approximation. While this approach is not an approximationalgorithm per se, we do bound the performance of this approach in terms of theoptimal outcome. Namely, as will be shown in Lemma 2.3, the best possible Robinsonapproximation has normalized ℓ distance at least Γ ( A ) from A . Therefore, ourresults implicitly bound the Robinson approximation achieved by our algorithm interms of the optimal solution.The results of this article are fundamentally different from any previous resultson ℓ ∞ -fitting Robinsonian structures (see for example [4]). Indeed, when matricesgrow large in size, the ℓ -norm provides us with a more suitable notion of “closeness”.This fact becomes apparent when we analyze a growing sequence of graphs which areconvergent in the sense of Lov´asz-Szegedy [18]. Indeed, this article (and the choiceof notation for the parameter Γ ) was motivated by our previous work [6], wherewe introduce a parameter Γ which characterizes Robinson graphons . Graphons aresymmetric functions on [0 , with values in [0 , , A = [ A i,j ] can be interpreted as a graphon in a natural way, bysplitting the unit square into subsquares of size n × n , and setting the graphon equalto A i,j everywhere in the ( i, j )-th subsquare. A Robinson graphon is a graphon whichis non-decreasing along every horizontal or vertical line towards the main diagonal.Our main result in [6] is that Γ becomes a continuous parameter, when the spaceof graphons is equipped with the box-norm. Therefore Γ provides us with a param-eter to measure Robinson resemblance, which can be efficiently approximated. Theparameters Γ and Γ are closely related, even though box-norm continuity does nothold for Γ anymore. In future work [12], we employ these parameters simultaneouslyin order to develop (continuous) methods for seriation of noisy data.Finally, we mention applications to graphs and networks. Binary Robinson ma-trices correspond to unit interval graphs. More precisely, a graph is a unit intervalgraph if and only if the adjacency matrix is Robinsonian, that is, there exists a la-belling of the vertices so that the resulting adjacency matrix is a Robinson matrix.The parameter Γ can be directly applied to a (labelled) graph, and if Γ is sufficientlysmall, our algorithm constructs a unit interval graph that is close, in edit distance, tothe original graph.When applied to real-life networks, the parameter Γ can be used to measurehow closely the matrix conforms to a linear model. In previous work, the authorsinvestigated this question in the context of graph limits [6, 7]. In a linear graph model the vertices of the graph are placed on a line, and the links are formed stochastically sothat vertices that are closer together are more likely to connect. The linear layout canrefer to a time line, in case of graphs derived from archeology, or the food chain, in caseof food webs. But a linear layout may also point to the presence of a strong hiddenvariable that influences link formation, such as hierarchy, in a professional social M. GHANDEHARI AND J. JANSSEN network, or age in a friendship graph. If a graph conforms to a linear model, thenwe expect the adjacency matrix of the graph to be “almost-Robinson.” Parameter Γ may therefore serve as a measure of the “linearity” of a given network.The rest of this article is organized as follows. In Section 2, we introduce thenecessary notations and definitions, and we state our main result, namely Theorem2.2. In Section 3, we present Algorithm 3.1 which finds a Robinson approximationfor the special case where A is a binary matrix ( i.e. a graph adjacency matrix). Thevalues of every cell in the Robinson approximation is decided based on the entries of A in the upper right and the lower left regions defined by that cell (see Figure 2).Algorithm 3.1 is very simple to state, however one needs to use careful approximationsand counting tricks to prove that the algorithm generates an output which indeed is agood ℓ -approximation for the input matrix. Section 4 provides us with an adaptationof Algorithm 3.1 to general matrices. This is indeed a natural generalization, as everymatrix with entries in [0 ,
1] decomposes into a convex combination of binary matrices.Fortunately, the parameter Γ distributes over such decompositions, even though itis not a linear parameter in general. This allows us to apply Algorithm 3.1 to thecomponents of the decomposition in stages. In Section 5, we develop an algorithmthat represents a preprocessing step for Algorithm 3.1. This preprocessing step isdesigned to transform the input matrix A , so that Algorithm 3.1 generates a better-approximating output matrix R . The trade-off here is that the preprocessing stepincreases the complexity of the algorithm, which still remains polynomial. We finishthe paper by some concluding remarks and future directions.
2. Definitions and main results.
For a given positive integer n , let A n denotethe set of all symmetric n × n matrices with entries in [0 , A ∈ A n is called binary if it has entriesonly from { , } . We refer to the position in the i -th row and j -th column of A asthe ( i, j )’th cell . For a matrix A of size n , we define its (normalized) ℓ -norm to be k A k = n P ni,j =1 | A i,j | . Definition An n × n symmetric matrix A is a Robinson matrix if, for all ≤ i < j < k ≤ n , (2.1) A i,j ≥ A i,k and A j,k ≥ A i,k . In this section, we define a parameter, denoted by Γ , which measures how badly amatrix fails to be Robinson. The choice of notation for Γ is due to the fact that itsimply adds the magnitude of violations to (2.1). Precisely, given a symmetric matrix A of size n , Γ ( A ) = 1 n X ≤ i 0, and 0 otherwise. It is clear that Γ ( A ) = 0 if and only if A is a Robinson matrix. Note also that, for a binary matrix A which is not Robinson,Γ ( A ) ≥ n . Moreover, the computation of Γ ( A ) for an n × n matrix A involves asimple summation which can be executed in O ( n ) steps. Finally note that, due tothe normalization factor n , Γ ( A ) ∈ [0 , 1) whenever A ∈ A n .We are now ready to state our main result (Theorem 2.2), whose proof takes alarge part of the paper and finishes only at the end of Section 5. N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA Theorem For every A ∈ A n , there exists a Robinson matrix R ∈ A n so that k A − R k ≤ 26 Γ ( A ) / . Moreover, R can be computed in polynomial time. In addition, if A is binary, thenthere exists a binary matrix R satisfying the conditions of the theorem. The distance between A and the Robinson approximation obtained in Theorem2.2 is bounded in terms of Γ ( A ), and thus the algorithm does not necessarily givethe best possible approximation. However, as stated in the following simple lemma,there is a close relationship between Γ ( A ) and the distance between A and the bestpossible Robinson approximation. Lemma For every pair of matrices A and R in A n , if R is a Robinson matrixthen k A − R k ≥ 14 Γ ( A ) . Consequently, we have min {k A − R k : R ∈ A n is Robinson similarity } ≥ 14 Γ ( A ) . Proof. Since the function [ · ] + is sub-additive, we observe thatΓ ( A ) = Γ ( A − R + R ) ≤ Γ ( A − R ) + Γ ( R ) = Γ ( A − R ) , as R is a Robinson matrix and thus Γ ( R ) = 0. Moreover, for every symmetric matrix B of size n , we haveΓ ( B ) = 1 n X ≤ i Fig. 1 . The black region is convex around the diagonal. k B G − B b G k between the augmented adjacency matrices B G and B b G of two labeledgraphs G and b G of the same order corresponds to the edit distance between the graphsthemselves. The edit distance between two graphs G and b G , denoted by ed ( G, b G ), isthe minimum number of edge deletions and edge additions that need to be performedon G to transfer it to b G . Applying these concepts, Theorem 2.2 directly leads to thefollowing corollary. Corollary For every graph G on vertex set V = { , , . . . , n } , there existsa unit interval graph b G on vertex set V so that ed ( G, b G ) n ≤ 26 Γ ( G ) / . 3. Robinson similarity approximation for binary matrices. In this sec-tion, we present an algorithm that finds a Robinson approximation for the specialcase where A is a binary matrix, and can thus be interpreted as the adjacency matrixof a graph. The algorithm can be intuitively understood as follows. We divide allcells of the matrix into black and white cells, and convert all zeros in the black cellsto ones, and all ones in the white cells to zeros. The black region is convex around thediagonal , in the sense that, if a cell is black, then so are all other cells closer to thediagonal in the same row or column. In other words, the binary matrix whose sup-port is precisely the black region is a Robinson matrix, which is indeed the Robinsonapproximation that the algorithm returns (See Figure 1).The decision on whether to assign a cell to the black or white region depends onthe entries in the upper right (UR) and lower left (LL) regions defined by the cell.Precisely, for any cell ( a, b ), 1 ≤ a < b ≤ n , we defineUR( a, b ) = { ( i, j ) : i < a < b < j } , LL( a, b ) = { ( i, j ) : a ≤ i ≤ j ≤ b } . Roughly speaking, a cell will be black if it has enough ones in its upper right region,and it is white when it has enough zeros in its lower left region. So, we need thefollowing notations (also shown in Figure 2):1 UR ( a, b ) = | UR( a, b ) ∩ { ( i, j ) : A ij = 1 }| , LL ( a, b ) = | LL( a, b ) ∩ { ( i, j ) : A ij = 0 }| . In addition, define 1 UR ( a, b ) = 0 LL ( a, b ) = 0 when a ∈ { , n + 1 } or b ∈ { , n + 1 } or a > b .Algorithm 3.1 has complexity θ ( n ), and is therefore linear in the size of the input.A cell is considered black if R i,j is set to one, and white if R i,j is set to zero. If ( i, j ) N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA ba Fig. 2 . Regions UR( a, b ) (blue) and LL( a, b ) (red) Algorithm 3.1 Robinson approximation of a binary matrix input: binary matrix A ∈ A n , threshold t > output: binary Robinson matrix R ∈ A n for i ← to n do R ,i ← R i, ← R i,n ← R n,i ← end forfor i ← to n do j ← n − while j ≥ i do UR ( i, j ) ← UR ( i − , j ) + 1 UR ( i, j + 1) − UR ( i − , j + 1) + A i − ,j +1 if UR ( i, j ) < t then R i,j ← R j,i ← else R i,j ← R j,i ← j ← j − end whileend forreturn R is black and i ≤ k < j , then UR( i, j ) ⊂ UR( i, k ), so 1 UR ( i, k ) ≥ UR ( i, j ) ≥ t , andthus ( i, k ) is also black. Similarly, ( k, j ) is also black. So, the region of black cells isconvex around the diagonal, and R is indeed a Robinson matrix. We will now showin Theorem 3.1 that the distance between R and A is bounded by a function of t and n , if the parameter t satisfies Condition (3.1). We will then show in Corollary 3.3that an appropriate t can be chosen as a function of Γ ( A ), so that k A − R k can bebounded in terms of Γ ( A ). Theorem Let A ∈ A n be a binary matrix, with the property that (3.1) for all ≤ i ≤ j ≤ n, UR ( i, j ) < t or LL ( i, j ) < t. If R is the matrix produced as output of Algorithm 3.1 on input A with threshold t ,then k A − R k ≤ √ t +4 n .Proof. As explained earlier, black cells are exactly the cells ( i, j ) for which R i,j attains 1, i.e. the cells for which 1 UR ( i, j ) ≥ t . Let B denote the collection of all blackcells above the diagonal, that is B = { ( i, j ) : 1 ≤ i ≤ j ≤ n and 1 UR ( i, j ) ≥ t } . Note that B is convex around the diagonal, in the sense that if a cell ( i, j ) belongs to B then LL( i, j ) ⊆ B . However, the black region can be disconnected. Precisely, therecan be diagonal cells not in B , in which case B consists of a collection of connected M. GHANDEHARI AND J. JANSSEN regions which are convex around the diagonal. Note also that, by definition, B cannotcontain any cells ( i, j ) so that | UR( i, j ) | < t , and specifically, B cannot contain anycells from the first row or the last column.Let ∂ B denote the set of boundary cells of B , i.e. ∂ ( B ) = { ( i, j ) ∈ B : ( i − , j ) 6∈ B or ( i, j + 1) 6∈ B} . The set ∂ B precisely contains all black cells above the diagonal that are adjacent tocells outside B . Thus,(3.2) B = [ ( i,j ) ∈ ∂ B LL( i, j ) . Similarly, the white region W and its boundary cells are defined as W = { ( i, j ) : 1 ≤ i ≤ j ≤ n and 1 UR ( i, j ) < t } , and ∂ W = { ( i, j ) ∈ W : ( i + 1 , j ) 6∈ W or ( i, j − 6∈ W} . Claim 1. Let B = { ( i, j ) ∈ B : A i,j = 0 } . Then |B | ≤ n √ t Proof of Claim 1. First, list elements of ∂ B as ( i , j ) , ( i , j ) , . . . , ( i m , j m ) in sucha way that i ≤ i ≤ . . . ≤ i m , and if i l = i l +1 then j l < j l +1 . This ordering followsthe “contour” of the black region, starting with the first black cell in the first rowcontaining any black cells. Since both indices range from at least 1 to at most n , it isclear that the boundary contains at most 2 n cells, and m ≤ n .To control the amount of possible overlaps in the covering of B in (3.2), we nowconstruct a subsequence C of ∂ B , so that the lower left regions of cells in C cover mostof the black region. The subsequence C = { ( i n k , j n k ) } is constructed inductively. Inthe first step, let ( i n , j n ) be the last cell in the first row of B . At step k ≥ n k be the largest index so that ( i n k , j n k ) ∈ C . Define n to be the first index in { , . . . , m } which satisfies j n k < j n , ( i n , j n + 1) 6∈ B , and either i n − i n k > ⌊√ t ⌋ or j n − j n k > ⌊√ t ⌋ . So cell ( i n , j n ) is the first cell in ∂ B whose row or column indexdiffers by at least ⌊√ t ⌋ + 1 from the last cell added to C , and also is the last blackcell in its row. Set n k +1 = n , and add ( i n k +1 , j n k +1 ) to C . Since ∂ B has at most 2 n elements, and ⌊√ t ⌋ + 1 ≥ √ t , this inductive process ends in at most n √ t steps. So |C| ≤ n √ t .We now claim that B \ S ( i,j ) ∈C LL( i, j ) can be covered with |C| squares of di-mensions ⌊√ t ⌋ × ⌊√ t ⌋ . Consider two consecutive elements of C , say ( i n k , j n k ) and( i n k +1 , j n k +1 ). Let ( i, j ) be the first cell in ∂ B \ S ( i,j ) ∈C LL( i, j ) after ( i n k , j n k ). Byconstruction, ( i, j ) is not in the same row or column as ( i n k , j n k ), and thus i > i n k and j > j n k . Let ( i ′ , j ′ ) be the last cell before ( i n k +1 , j n k +1 ) in ∂ B \ S ( i,j ) ∈C LL( i, j ).(If no such n exists, then k is the last index in C , and we let ( i ′ , j ′ ) = ( i m , j m ), thelast cell of ∂ B .)By construction of C , we know that i ′ − i + 1 ≤ i ′ − i n k ≤ ⌊√ t ⌋ and j ′ − j + 1 ≤ j ′ − j n k ≤ ⌊√ t ⌋ . Moreover, since ( i, j ) and ( i ′ , j ′ ) belong to ∂ B , and B is convexaround the diagonal, every cell ( a, b ) ∈ B with i ≤ a ≤ i ′ must satisfy j ≤ b ≤ j ′ , so itmust belong to the square whose top-left corner is ( i, j ), and its bottom-right corneris ( i ′ , j ′ ). We denote this square by S k , and note that | S k | ≤ t . Thus, B ⊆ |C| [ k =1 LL( i n k , j n k ) ∪ |C| [ k =1 S k . N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA i, j ) ∈ B satisfies 0 LL ( i, j ) ≤ t . So, |B | ≤ X ( i,j ) ∈C LL ( i, j ) + X ( i,j ) ∈C | S i,j | ≤ |C| t + |C| t = 4 n √ t. Claim 2. Let W = { ( i, j ) ∈ W : A i,j = 1 } . Then |W | ≤ n √ t + 2 n . Proof of Claim 2. This proof is similar to the proof of the previous claim. Cellsin the boundary ∂ W are enumerated similarly, and a subsequence D = { ( i n k , j n k ) } of ∂ W is defined analogous to C : let ( i n , j n ) be the last cell in the first column of W , andfor k ≥ 1, define n k +1 to be the first index in { , . . . , m } which satisfies i n k +1 > i n k ,( i n k +1 + 1 , j n k +1 ) 6∈ W , and either i n k +1 − i n k > ⌊√ t ⌋ or j n k +1 − j n k > ⌊√ t ⌋ . Let S k be the square of size at most ⌊√ t ⌋ × ⌊√ t ⌋ covering all white cells between consecutivecells of D . Then we have W ⊆ ∂ W ∪ [ ( i,j ) ∈D UR( i, j ) ∪ |D| [ k =1 S k Note that dealing with the white region is slightly different from the black region, inthe sense that UR( i, j ) does not include cell ( i, j ) and the cells in row i and column j (see Figure 2). So, we include the boundary explicitly to cover the white region.By definition of W , every white cell ( i, j ) ∈ W satisfies 1 UR ( i, j ) < t . This impliesthat |W | ≤ | ∂ W| + X ( i,j ) ∈D UR ( i, j ) + |D| X i =1 | S k | ≤ n + |D| t + |D| t ≤ n √ t + 2 n. The above two claims show that R and A differ above the diagonal in at most8 n √ t + 2 n cells. Adding the region below the diagonal and normalizing, we concludethat k A − R k ≤ √ t +2) n .The following simple counting lemma provides a threshold, in terms of Γ , forwhich Condition (3.1) always holds. Lemma Let A ∈ A n be a binary matrix whose diagonal entries are all 1.Then, for every cell ( i, j ) , we have UR ( i, j ) 0 LL ( i, j ) ≤ n Γ ( A ) . Proof. Without loss of generality, assume that i < j .1 UR ( i, j ) 0 LL ( i, j ) = X ( s, t ) ∈ UR( i, j )( s ′ , t ′ ) ∈ LL( i, j ) [ A s,t − A s ′ ,t ′ ] + ≤ X ≤ s
Corollary For every binary matrix A ∈ A n , there exists a binary Robinsonmatrix R ∈ A n such that k A − R k ≤ n + 2 / Γ ( A ) / . Moreover, R can be computed in linear time.Proof. From the binary matrix A ∈ A n , we first construct e A by replacing every0 entry on the diagonal of A with 1. Clearly, e A ∈ A n and Γ ( e A ) ≤ Γ ( A ). Moreover, k A − e A k ≤ n . If Γ ( e A ) = 0, we take R = e A and we are done. So assume thatΓ ( e A ) > 0. Let R ∈ A n be the matrix produced as output of Algorithm 3.1 on input e A with threshold t = n q ( e A ). By Lemma 3.2 and the fact that Γ ( e A ) > e A and parameter t = n q ( e A ) satisfy Condition (3.1) of Theorem 3.1. Thus, k R − A k ≤ k A − e A k + k e A − R k ≤ n + 16(4Γ ( e A )) / ≤ n + 2 / Γ ( A ) / . 4. Robinson similarity approximations of general matrices. For generalmatrices, we first decompose the matrix into a convex combination of binary matrices,a standard technique widely used in the literature, and then apply Algorithm 3.1 toeach summand. Given any matrix A ∈ A n , let range( A ) = { A i,j : 1 ≤ i ≤ j ≤ n } .Consider the linear ordering 0 = s < s < · · · < s m of range( A ) ∪ { } , and definematrices A ( k ) , 1 ≤ k ≤ m as follows:(4.1) A ( k ) i,j = (cid:26) A i,j ≥ s k , A ( k ) is binary, and(4.2) A = m X k =1 ( s k − s k − ) A ( k ) . N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA A ( k ) as the layers of A . From the definition of the layers, itis easy to see that A ( k ) ≤ A ( l ) whenever l < k . Indeed for such l and k , if A ( k ) i,j = 1then A ( l ) i,j = 1 as well, since s l < s k .The advantage of writing A as a convex combination of its layers, as opposed toany other decomposition of A , lies in the fact that Γ distributes over this particulardecomposition of A , even though Γ is not a linear map in general. Proposition Suppose A is “layered” as in Equation (4.2). Then we have Γ ( A ) = m X l =1 ( s l − s l − )Γ ( A ( l ) ) . (4.3) Proof. Fix a triple i, j, k satisfying 1 ≤ i < k < j ≤ n , and let n , n , n ∈{ , . . . , m } be so that A i,j = s n , A i,k = s n , and A k,j = s n . From the definition ofthe layers in (4.1), we have(i) A ( l ) i,j = 1 if l ≤ n , and A ( l ) i,j = 0 if l > n .(ii) A ( l ) i,k = 1 if l ≤ n , and A ( l ) i,k = 0 if l > n .(iii) A ( l ) k,j = 1 if l ≤ n , and A ( l ) k,j = 0 if l > n .Note first that [ A i,j − A i,k ] + = [ s n − s n ] + = 0 precisely when n ≤ n . Using (i), (ii)and (iii) it is easy to see that, if n ≤ n , then [ A ( l ) i,j − A ( l ) i,k ] + = 0 for every 1 ≤ l ≤ m .On the other hand, if [ A i,j − A i,k ] + > n > n , then, if l ≤ n or l > n then [ A ( l ) i,j − A ( l ) i,k ] + = 0, and if n < l ≤ n then [ A ( l ) i,j − A ( l ) i,k ] + = 1. Hence, we canverify the following claim:[ A i,j − A i,k ] + = m X l =1 ( s l − s l − )[ A ( l ) i,j − A ( l ) i,k ] + . Indeed, if n ≤ n then both sides of the above equation are equal to 0. For the casewhere n > n , we have m X l =1 ( s l − s l − )[ A ( l ) i,j − A ( l ) i,k ] + = n X l = n +1 s l − s l − = s n − s n = [ A i,j − A i,k ] + . Repeating the above argument, we obtain a similar claim for [ A ( l ) i,j − A ( l ) k,j ] + , andconsequently we get[ A i,j − A i,k ] + + [ A i,j − A k,j ] + = m X l =1 ( s l − s l − ) (cid:16) [ A ( l ) i,j − A ( l ) i,k ] + + [ A ( l ) i,j − A ( l ) k,j ] + (cid:17) . So we have m X l =1 ( s l − s l − )Γ ( A ( l ) ) = m X l =1 ( s l − s l − ) (cid:16) n X ≤ i Algorithm 4.1 Robinson similarity approximation of a general matrix input: Matrix A ∈ A n , positive thresholds t , . . . , t m output: Robinson similarity matrix R ∈ A n Compute range( A ) ∪ { , } , as an ordered list s for i ← to n dofor k ← to m do R ( k )1 ,i ← R ( k ) i, ← R ( k ) i,n ← R ( k ) n,i ← A i,i ← end forend forfor i ← to n do j ← n − while j ≥ i dofor k ← to m doif A i,j ≥ s [ k ] then temp ← else temp ← k UR ( i, j ) ← k UR ( i − , j ) + 1 k UR ( i, j + 1) − k UR ( i − , j + 1) + temp if k UR ( i, j ) < t k then R ( k ) i,j ← R ( k ) j,i ← else R ( k ) i,j ← R ( k ) j,i ← end for j ← j − end whileend for R ← k ← to m do R ← R + ( s [ k ] − s [ k − R ( k ) return R Algorithm 4.1 is essentially a simultaneous execution of Algorithm 3.1 for eachbinary matrix A ( k ) . The quantities 1 k UR ( i, j ) thus refer to the number of ones in theregion UR( i, j ) in A ( k ) . The algorithm has complexity O ( n m ), where m is the sizeof range( A ), and thus m ≤ n . Since every matrix R ( k ) is Robinson similarity, so istheir linear combination R . In the following theorem, we will show that there existthresholds t , . . . , t m so that the difference between A and R is bounded by C Γ ( A ) / for some constant C . To avoid anomalous behavior, we assume that our input matrixis not Robinson similarity, i.e. Γ ( A ) > Theorem Let A ∈ A n , and Γ ( A ) > . If R is the matrix produced as outputof Algorithm 4.1 on input A with thresholds t k = p ( A ( k ) ) n for k = 1 , . . . , m , then k A − R k ≤ / Γ ( A ) / (1 + O ( n − / )) . Proof. First suppose A is a binary matrix. Then by Corollary 3.3 applied to A with threshold p ( A ) n , we get k A − R k ≤ n + 2 / Γ ( A ) / . Next, assume that A ∈ A n is a general matrix, and let A = P mk =1 ( s k − s k − ) A ( k ) be the decomposition of A into layers of binary matrices as described in Equation(4.2). Then, we have Γ ( A ) = P mk =1 ( s k − s k − )Γ ( A ( k ) ) . To avoid clutter of notation,let ǫ = Γ ( A ) and ǫ k := Γ ( A ( k ) ). For every 1 ≤ k ≤ m , we apply Corollary 3.3 to A ( k ) with threshold t k = (4 ǫ k ) / n to obtain a Robinson similarity matrix R ( k ) such N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA k A ( k ) − R ( k ) k ≤ n + 2 / ǫ / k . So, Algorithm 4.1 computes R = P mk =1 ( s k − s k − ) R ( k ) , which is Robinson similarityas well. Moreover, k A − R k ≤ m X k =1 ( s k − s k − ) k A ( k ) − R ( k ) k ≤ / m X k =1 ( s k − s k − ) ǫ / k + m X k =1 ( s k − s k − ) 5 n ≤ / (Γ ( A )) / + 5 n , where in the last inequality we used the fact that the function f ( x ) = x / is concave.By definition, if A ∈ A n is not Robinson similarity, then Γ ( A ) ≥ n , and thus n Γ ( G ) / ≥ n / . This completes the proof. 5. Improvement through preprocessing. Finally, we give an algorithm thatrepresents a preprocessing step for Algorithm 3.1. By Theorem 3.1, the Robinsonsimilarity approximation produced by Algorithm 3.1 is at bounded distance from theinput matrix A , provided that Condition (3.1) holds. By Lemma 3.2, the conditionholds for every matrix, if we choose t > √ n (Γ ( A )) / (see proof of Corollary 3.3.)The preprocessing step is designed to transform the input matrix A , such that thecondition holds for a smaller value of t . The modified matrix is then used as inputto Algorithm 3.1 with the new value of t , which leads to an output matrix R that iscloser to the input matrix.The trade-off here is that the preprocessing step increases the complexity of thealgorithm. This increase is tolerable, as the complexity still remains polynomial. Wegive an implementation which is quadratic for binary matrices, but we believe that amore sophisticated implementation would lead to an improvement in the complexity.In the following, the algorithm is given in detail for binary matrices only. It can easilybe adapted to general matrices, in much the same way that Algorithm 4.1 is adaptedfrom Algorithm 3.1.First, we introduce some terminology. Suppose that matrix A is given, and athreshold value t > A whichare on or above the diagonal. We call a cell ( i, j ) ∈ ∆ inverted if 1 UR ( i, j ) ≥ t and0 LL ( i, j ) ≥ t . Thus, there exist inverted cells if and only if Condition (3.1) is violated.To toggle a cell ( i, j ) is to set the value of all cells in UR( i, j ) equal to zero, and thevalues of all cells in LL( i, j ) equal to 1. It is easy, yet important, to observe thattoggling a cell can only decrease Γ ( A ). Lemma Let A ∈ A n and t > be given. Suppose that ( i, j ) ∈ ∆ is aninverted cell, and let e A denote the matrix obtained from A by toggling ( i, j ) . Then forany fixed triple ≤ s < k < t ≤ n , we have (5.1) [ e A s,t − e A s,k ] + ≤ [ A s,t − A s,k ] + and [ e A s,t − e A k,t ] + ≤ [ A s,t − A k,t ] + . Consequently, we get Γ ( e A ) ≤ Γ ( A ) . M. GHANDEHARI AND J. JANSSEN Proof. We only prove the first inequality; the second one can be proved similarly.Fix a triple 1 ≤ s < k < t ≤ n . First note that, the first inequality in (5.1) holdstrivially whenever ( s, t ) , ( s, k ) ∈ ∆ \ (LL( i, j ) ∪ UR( i, j )), as on these cells the matrices A and e A are identical. For the cases where ( s, t ) ∈ UR( i, j ) or ( s, k ) ∈ LL( i, j ), wehave e A ( s, t ) = 0 or e A ( s, k ) = 1. Thus, in both cases, we get [ e A s,t − e A s,k ] + = 0, andin particular [ e A s,t − e A s,k ] + ≤ [ A s,t − A s,k ] + . Moreover, it is clear from the definitionof the region LL( i, j ) that if ( s, t ) ∈ LL( i, j ) then ( s, k ) ∈ LL( i, j ). Similarly, if( s, k ) ∈ UR( i, j ) then ( s, t ) ∈ UR( i, j ) as well. Putting all these together, we concludethat the desired inequality holds in all cases. Therefore,Γ ( e A ) = 1 n X ≤ s Preprocessing step input: Matrix A ∈ A n , threshold t output: Updated matrix A for i ← to n dofor j ← i to n do Compute 1 UR ( i, j ) and 0 LL ( i, j ) if UR ( i, j ) ≥ t and LL ( i, j ) ≥ t thenfor all cells ( r, s ) ∈ UR( i, j ) do A r,s ← A s,r ← for all cells ( r, s ) ∈ LL( i, j ) do A r,s ← A s,r ← end forend for Note that Algorithm 5.1 involves computing 1 UR ( a, b ) for all 1 ≤ a ≤ i and j ≤ b ≤ n , and computing 0 LL ( a, b ) for all i ≤ a ≤ j and i ≤ b ≤ j . These valuesmust be recomputed in each iteration, since A is being changed. Any cell ( i, j ) istested exactly once, to see whether it is inverted, and if it is, it is toggled. Thisnaive implementation of the algorithm has complexity O ( n ) and is thus quadraticin the size of the input. When adapted to general matrices, the complexity becomes O ( mn ), where m is the number of different values taken by entries of A . Clearly m ≤ n . Also, if all entries of A are rounded to the nearest multiple of ǫ A / ǫ = Γ ( A )), then m = ǫ − / , while the error between A and R is still of the sameorder.One may be concerned that toggling some cells may create new inverted cells, inwhich case, just considering each cell once would not be sufficient, and the complexitycould increase. The following lemma shows this cannot be the case. Lemma Suppose Algorithm 5.1 is applied to a binary matrix A , with threshold t . Then the output of the algorithm, which we call the modified matrix, satisfies thecondition that, for each cell ( i, j ) , UR ( i, j ) < t or LL ( i, j ) < t . N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA Proof. It suffices to show that, if an inverted cell ( i, j ) is toggled, no cell thatwas not inverted before the toggle can become inverted afterwards. Suppose to thecontrary that there exists a cell ( k, ℓ ) which becomes inverted after the toggle. That is,either 1 UR ( k, ℓ ) is increased by the toggle, or 0 LL ( k, ℓ ) is increased. Assume withoutloss of generality that 1 UR ( k, ℓ ) increases after the toggle. The toggle sets all entriesin UR( i, j ) to zero, and all entries to LL( i, j ) to one. So for 1 UR ( k, ℓ ) to increase,UR( k, ℓ ) and LL( i, j ) must intersect. Thus, ( k, ℓ ) ∈ LL( i, j ). Therefore, after thetoggle, all entries in LL( k, ℓ ) have been set to one, so 0 LL ( k, ℓ ) = 0 < t . Therefore,( k, ℓ ) is not an inverted cell after the toggle, which contradicts our assumption.The above lemma shows that the modified matrix b A returned by Algorithm 5.1satisfies Condition 3.1 of Theorem 3.1 for the chosen value of t . Thus, this matrixcan be used as input for Algorithm 3.1 to obtain a Robinson similarity approximation R , and Theorem 3.1 can be applied to bound k b A − R k . However, to obtain a goodapproximation we need to make sure we can also bound k A − b A k , the distance betweenthe input and output matrices of Algorithm 5.1. The following lemma gives such abound. Lemma Let A ∈ A n be a binary matrix. Let b A denote the output of Algorithm5.1 with threshold t . Then k A − b A k ≤ n t (Γ ( A ) − Γ ( b A )) .Proof. The output matrix b A is obtained from A by consecutive toggling of invertedcells, occurring in lines 5 and 6 of Algorithm 5.1. The condition in line 4 checkswhether a cell is inverted. Let A = A , A , A , . . . , A m = b A denote the matrices inthe intermediate steps. We will bound the distance between two consecutive matrices.Fix s , 0 ≤ s < m , and assume that A s is modified because cell ( i, j ) is found tobe inverted, and is then toggled. So 1 UR ( i, j ) ≥ t and 0 LL ( i, j ) ≥ t , where 1 UR and0 LL are computed from A s . So A s +1 is formed from A s by adjusting every cell inUR( i, j ) and its counterpart below the diagonal to be 0, and every cell in LL( i, j ) andits counterpart below the diagonal to be 1.We can intuitively observe that Γ ( A s ) drops by at least LL ( i,j )1 UR ( i,j ) n when ( i, j )is toggled. Note that, normalizing as in the definition of Γ and applying the samereasoning as in Lemma 3.2, one may be tricked into thinking that Γ ( A s ) drops at leastby n LL ( i, j )1 UR ( i, j ) when ( i, j ) is toggled, since a contribution of n is removed foreach pair of “bad” cells from UR( i, j ) and LL( i, j ). However, this reasoning does nottake overcounting into consideration. We dedicate the following claim to a rigorousproof of this intuitive observation. Claim For A , s , and ( i, j ) as above, we have Γ ( A s ) − Γ ( A s +1 ) ≥ LL ( i, j )1 UR ( i, j ) n . Proof of claim. First, consider a cell ( k, ℓ ) from UR( i, j ) containing 1, and a cell( k ′ , ℓ ′ ) from LL( i, j ), containing 0. Similar to the proof of Lemma 3.2, we have that2 = 2[ A sk,ℓ − A sk ′ ,ℓ ′ ] + = [ A sk,ℓ − A sk ′ ,ℓ ] + + [ A sk ′ ,ℓ − A sk ′ ,ℓ ′ ] + +[ A sk,ℓ − A sk,ℓ ′ ] + + [ A sk,ℓ ′ − A sk ′ ,ℓ ′ ] + . (5.2)By Lemma 5.1, for every triple 1 ≤ k < l ′ < l ≤ n we have[ A s +1 k,l − A s +1 k,l ′ ] + ≤ [ A sk,l − A sk,l ′ ] + and [ A s +1 k,l − A s +1 l ′ ,l ] + ≤ [ A sk,l − A sl ′ ,l ] + . M. GHANDEHARI AND J. JANSSEN This, together with the fact that A s +1 k,l = 0 whenever ( k, l ) ∈ UR( i, j ), implies that X ≤ k 3, the preprocessing step leads to a substantially better Robinson approximation. Proof of Theorem 2.2. First assume that A ∈ A n is a binary matrix with Γ ( A ) = ǫ > 0. Let b A be the output of Algorithm 5.1, with threshold t = 4 − / ǫ / n . FromLemma 5.3, k A − b A k ≤ / ǫ − / (Γ ( A ) − Γ ( b A )) ≤ / ǫ / . From Lemma 5.2, we have that b A satisfies Condition (3.1) of Theorem 3.1 for ourchoice of t . Let R be the output of Algorithm 3.1 applied to b A with parameter t = 4 − / ǫ / n . Then, by Theorem 3.1, we have k b A − R k ≤ √ t + 4 n = 4 / ǫ / + 4 n . Combining these inequalities, we get k A − R k ≤ · / ǫ / + n ≤ ǫ / , where weused the fact that Γ ( A ) = ǫ ≥ n in the last inequality.Next, assume that A ∈ A n is a general matrix, and let A = P mk =1 ( s k − s k − ) A ( k ) be the decomposition of A into layers of binary matrices as described in Equation(4.2). Let ǫ k := Γ ( A ( k ) ), and recall that ǫ = P mk =1 ( s k − s k − ) ǫ k . For every 1 ≤ k ≤ m , we apply the process described in the above paragraph to A ( k ) with parameter t k = 4 − / ǫ / k n , and we obtain Robinson matrices R ( k ) such that k A ( k ) − R ( k ) k ≤ ǫ / k . Letting R = P mk =1 ( s k − s k − ) R ( k ) , we have k A − R k ≤ m X k =1 ( s k − s k − ) k A ( k ) − R ( k ) k ≤ m X k =1 ( s k − s k − ) ǫ / k ≤ ǫ / , . GHANDEHARI AND J. JANSSEN where in the last inequality we used the fact that the function f ( x ) = x / is concave.Finally, we observe that the outputs of Algorithm 5.1 and Algorithm 3.1, whenapplied to a binary matrix A , are again binary matrices. Therefore, the Robinsonapproximation R of Theorem 2.2 is binary, when A is a binary matrix. 6. Conclusions and further work. We defined a parameter Γ which measureshow much a matrix resembles a Robinson similarity matrix. We gave a polynomialtime algorithm which takes as input a symmetric matrix A , and finds a Robinson ma-trix R so that the normalized ℓ -distance between A and R is bounded by 26Γ ( A ) / .The motivation of our work is the application to the problem of seriation of noisy data.This problem can now be approached by solving instead the optimization problem:given a matrix A , find a permutation π of the rows and columns of A so that Γ ( A π )is minimized.Our construction method is based on a combinatorial algorithm that runs in poly-nomial time. However, this may not give the best possible such Robinson approxi-mation. As remarked in the introduction, the problem of finding the best possibleRobinson approximation, with error measured in ℓ norm, can be formulated as alinear program. An open problem is whether there exists a combinatorial algorithmfor this task (as there exist for the ℓ ∞ norm.)In future work, we propose to study this optimization problem, to attempt to findalgorithms which solve or approximate the problem, and to determine their complex-ity. It is well-known that the second eigenvector of the Laplacian of the matrix, alsoknown as the Fiedler vector, is effective in finding the correct permutation in seriationwithout error. We propose to study the relationship between the Fiedler vector andthe parameter Γ .An immediate next step is to test our algorithm on real data, and see whether,in practice, the algorithm outperforms the theoretical bound. As well, a clever imple-mentation of Algorithm 5.1 will likely lead to improved efficiency. Acknowledgments. The authors thank the anonymous referees, whose sugges-tions greatly improved the paper. REFERENCES[1] J. E. Atkins, E. G. Boman, and B. Hendrickson , A spectral algorithm for seriation and theconsecutive ones problem , SIAM J. Comput., 28 (1999), pp. 297–310, https://doi.org/10.1137/S0097539795285771, http://dx.doi.org/10.1137/S0097539795285771.[2] J.-P. Barth´elemy and F. Brucker , NP-hard approximation problems in overlapping clus-tering , J. Classif., 18 (2001), pp. 159–183, https://doi.org/10.1007/s00357-001-0014-1,https://doi.org/10.1007/s00357-001-0014-1.[3] C.-H. Chen , Generalized association plots: information visualization via iteratively correlatedmatrices , Statistica Sinica, 12 (2002), pp. 7–29.[4] V. Chepoi, B. Fichet, and M. Seston , Seriation in the presence of errors: NP-hardnessof ℓ ∞ -fitting robinson structures to dissimilarity matrices , J. Classification, 26 (2009),pp. 279–296.[5] V. Chepoi and M. Seston , Seriation in the presence of errors: A factor 16 approximationalgorithm for ℓ ∞ -fitting robinson structures to distances , Algorithmica, 59 (2011), pp. 521–568.[6] H. Chuangpishit, M. Ghandehari, M. Hurshman, J. Janssen, and N. Kalyaniwalla , Lin-ear embeddings of graphs and graph limits H. Chuangpishit, M. Ghandehari, and J. Janssen , Uniform linear embeddings of graphons ,European J. Combin., 61 (2017), pp. 47–68, https://doi.org/10.1016/j.ejc.2016.09.004,http://dx.doi.org/10.1016/j.ejc.2016.09.004.N OPTIMIZATION PARAMETER FOR SERIATION OF NOISY DATA[8] D. G. Corneil , A simple 3-sweep LBFS algorithm for the recognition of unit interval graphs ,Discrete Applied Mathematics, 138 (2004), pp. 371 – 379.[9] D. G. Corneil, H. Kim, S. Natarajan, S. Olariu, and A. P. Sprague , Simple linear timerecognition of unit interval graphs , Information Processing Letters, 55 (1995), pp. 99 – 104.[10] N. Flammarion, C. Mao, and P. Rigollet , Optimal rates of statistical seriation .arXiv:1607.02435 [math.ST], 2016.[11] F. Fogel, A. d’Aspremont, and M. Vojnovic , Spectral ranking using seriation , J. Mach.Learn. Res., 17 (2016), pp. Paper No. 88, 45.[12] M. Ghandehari and J. Janssen , A stable parameter to quantify geometric structure ofgraphons . Unpublished Manuscript, 2018.[13] M. Hahsler, K. Hornik, and C. Buchta , Getting things in order: An introduction to the Rpackage seriation , J. Statistical Software, 25 (3) (2008).[14] M. Laurent and M. Seminaroti , A lex-bfs-based recognition algorithm for robinsonian ma-trices M. Laurent and M. Seminaroti , Similarity-first search: a new algorithm with applicationto robinsonian matrix recognition , SIAM J. Discrete Math., 31 (2017), pp. 1765–1800,https://doi.org/https://doi.org/10.1137/16M1056791.[16] I. Liiv , Seriation and matrix reordering methods: An historical overview , Stat. Anal. DataMin., 3 (2010), pp. 70–91.[17] P. J. Looges and S. Olariu , Optimal greedy algorithms for indifference graphs , Computers& Mathematics with Applications, 25 (1993), pp. 15 – 25.[18] L. Lov´asz and B. Szegedy , Limits of dense graph sequences , J. Combin. Theory Ser. B, 96(2006), pp. 933–957, https://doi.org/10.1016/j.jctb.2006.05.002.[19] B. G. Mirkin and S. N. Rodin , Graphs and genes , vol. 11 of Biomathematics, Springer-Verlag,Berlin, 1984. Translated from the Russian by H. Lynn Beus.[20] P. Pr´ea and D. Fortin , An optimal algorithm to recognize Robinsonian dissimilarities , J.Classification, 31 (2014), pp. 351–385, https://doi.org/10.1007/s00357-014-9150-2, http://dx.doi.org/10.1007/s00357-014-9150-2.[21] F. Roberts , Indifference graphs , in Proof Techniques in Graph Theory, F. Harary, ed., Aca-demic Publishers, 1969, pp. 139–146.[22] W. S. Robinson , A method for chronologically ordering archeological deposits , American An-tiquity, 16 (1951), pp. 293–301.[23] M. Seminaroti , Combinatorial Algorithms for the Seriation Problem , PhD thesis, Un. Tilburg,2016.[24]