On the Recognition of Strong-Robinsonian Incomplete Matrices
aa r X i v : . [ c s . D M ] J a n On the Recognition of Strong-Robinsonian IncompleteMatrices
Julio Aracena ∗ Christopher Thraves Caro † Abstract
A matrix is incomplete when some of its entries are missing. A
Robinson in-complete symmetric matrix is an incomplete symmetric matrix whose non-missingentries do not decrease along rows and columns when moving toward the diago-nal. A
Strong-Robinson incomplete symmetric matrix is an incomplete symmetricmatrix A such that a k,l ≥ a i,j if a i,j and a k,l are two non-missing entries of A and i ≤ k ≤ l ≤ j . On the other hand, an incomplete symmetric matrix is Strong-Robinsonian if there is a simultaneous reordering of its rows and columns thatproduces a Strong-Robinson matrix. In this document, we first show that there isan incomplete Robinson matrix which is not Strong-Robinsonian. Therefore, thesetwo definitions are not equivalent. Secondly, we study the recognition problemfor Strong-Robinsonian incomplete matrices. It is known that recognition of in-complete Robinsonian matrices is NP-Complete. We show that the recognition ofincomplete Strong-Robinsonian matrices is also NP-Complete. However, we showthat recognition of Strong-Robinsonian matrices can be parametrized with respectto the number of missing entries. Indeed, we present an O ( | w | b n ) recognition al-gorithm for Strong-Robinsonian matrices, where b is the number of missing entries, n is the size of the matrix, and | w | is the number of different values in the matrix. Keywords:
Robinsonian matrices, strong-Robinsonian matrices, incomplete matrices,matrix completion, recognition problem.
A symmetric matrix is
Robinson if its entries do not decrease along rows and columnswhen moving toward the main diagonal. A symmetric matrix is
Robinsonian if thereis a simultaneous reordering of its rows and columns such that it results in a Robinson ∗ CI MA and Departamento de Ingenier´ıa Matem´atica, Facultad de Ciencias F´ısicas y Matem´aticas,Universidad de Concepci´on, Chile. ( [email protected] ). J. Aracena was partially supportedby ANID-Chile through the project
Centro de Modelamiento Matem´atico (AFB170001) of thePIA Program:“Concurso Apoyo a Centros Cient´ıficos y Tecnol´ogicos de Excelencia con FinanciamientoBasal”, and by ANID-Chile through Fondecyt project 1151265. † (Corresponding Author) Departamento de Ingenier´ıa Matem´atica, Facultad de Ciencias F´ısicas yMatem´aticas, Univiversidad de Concepci´on, Chile. Adress: Av. Esteban Iturra s/n, Casilla 160-C,Concepci´on, Chile. Tel.: +56-41-2203129. Email: [email protected] . Seriation problem introduced inthe same work is to decide whether the similarity matrix of a data set is Robinsonianand write it as a Robinson matrix if possible.Robinsonian matrices have applications in many different contexts such as archae-ology [22], data visualization [2], exploratory analysis [12], bioinformatics [26], andmachine learning [7]. Liiv in [17] surveyed the seriation problem, matrix reorderingand its applications. Also, Laurent and Seminaroti in [13], using Robinsonian ma-trices, gave a new class of instances for the Quadratic Assignment Problem which issolvable in polynomial time.It is natural to consider the seriation problem with incomplete data due to errors inthe data set [9, 8]. In this document, we extend the definition of Robinson and Robin-sonian matrices to incomplete symmetric matrices. A symmetric matrix is said to be incomplete if some of its entries are missing. When considering incomplete symmetricmatrices, two possible extensions for Robinson matrices appear. The first extensionsays that an incomplete symmetric matrix is
Robinson if its non-missing entries donot decrease along rows and columns when moving toward the diagonal. But withthis definition, for a given entry there may be strictly smaller entries that are closerto the main diagonal. This phenomena appears due to the missing entries. Hence, wepresent a second extension for Robinson matrices in which this phenomena is avoided.An incomplete symmetric matrix A is Strong-Robinson if a k,l ≥ a i,j whenever a i,j and a k,l are two non-missing entries of A and i ≤ k ≤ l ≤ j . These two definitions coincidewhen we consider complete symmetric matrices, and they actually coincide in Robinsonmatrices.From this point, it is natural to define incomplete Robinsonian matrices and incom-plete Strong-Robinsonian matrices as symmetric matrices that admit a simultaneousreordering of their rows and columns such that we obtain an incomplete Robinsonmatrix or an incomplete Strong-Robinson matrix, respectively.Recognition of incomplete Robinsonian matrices was shown to be NP-Complete in[6]. Furthermore, they gave a lower bound on the complexity of this problem by showingthat there is no algorithm running in sub-exponential time that recognizes incompleteRobinsonian matrices. In this document, we first understand that these two definitionsare actually different and then we study the recognition problem for incomplete Strong-Robinsonian matrices showing that this problem can be solved in polynomial time ifthe number of missing entries is considered to be a constant parameter. In this document, we consider symmetric n × n matrices typically denoted by A . Weuse a i,j to denote A ’s entry in row i and column j . We say that a matrix is incomplete if one or more of its entries are not defined, or missing. We use a i,j = ∗ to denote that a i,j is a missing entry of A . We use w ( A ) to denote the set of different (numerical)values in A (if A is an incomplete matrix the symbol ∗ does not belong to w ( A )).2ence, | w ( A ) | denotes the amount of different numerical values in A . As an example,if A is a 0 − w ( A ) = { , } and | w ( A ) | = 2, even if A is incomplete. Weuse b ( A ) to denote the number of missing entries in A . When contextually clear, wesimply use b .Now we introduce Robinson and Strong-Robinson incomplete matrices. Definition 1.
Let A be a symmetric n × n incomplete matrix. We say that A is Robinson if: • a i,j ≤ a i,k for all i ≤ k ≤ j , such that a i,j = ∗ 6 = a i,k , and • a i,j ≤ a l,j for all i ≤ l ≤ j , such that a i,j = ∗ 6 = a l,j . Definition 2.
Let A be a symmetric n × n incomplete matrix. We say that A is Strong-Robinson if: a i,j ≤ a k,l for all i ≤ k ≤ l ≤ j such that a i,j = ∗ 6 = a k,l . Event though, these two definitions look similar, they are not equivalent. Never-theless, if we consider complete matrices, i. e., matrices with no missing entry, thesetwo definitions become equivalent, and they can be expressed as follows:for every a i,j such that 1 ≤ i < j ≤ n : a i,j ≤ min { a i,j − , a i +1 ,j } . An incomplete symmetric matrix A is Robinsonian (resp.,
Strong-Robinsonian )if there is a permutation π : { , , . . . , n } → { , , . . . , n } such that A π , the matrixdefined by the entries a π ( i ) ,π ( j ) , is Robinson (resp., Strong-Robinson). In other words, A is Robinsonian (resp., Strong-Robinsonian) if there is a simultaneous reordering ofits rows and columns that results in a Robinson (resp., Strong-Robinson) matrix. Recognition of complete Robinsonian matrices has been studied by several authors.Mirkin et al. in [19] presented an O ( n ) recognition algorithm, where n × n is thesize of the matrix. Chepoi et al., using divide and conquer techniques, introduced an O ( n ) recognition algorithm in [3]. Pr´ea and Fortin in [23] provided an O ( n ) optimalrecognition algorithm for complete Robinsonian matrices using PQ trees.Using the relationship between Robinsonian matrices and unit interval graphs pre-sented in [24], Monique Laurent and Matteo Seminaroti in [14] introduced a recogni-tion algorithm for Robinsonian matrices that uses Lex-BFS, whose time complexity is O ( | w | ( m + n )), where m is the number of nonzero entries in the matrix, and | w | is thenumber of different values in the matrix. Later in [15], the same authors presented arecognition algorithm with time complexity O ( n + nm log n ) that uses similarity firstsearch. Again, using the relationship between Robinsonian matrices and unit inter-val graphs, Laurent et al. in [16] gave a characterization of Robinsonian matrices viaforbidden patterns. 3he Seriation problem also has been studied as an optimization problem. Given an n × n matrix D , seriation in the presence of errors is to find a Robinsonian matrix R that minimizes the error defined as: max || d i,j − r i,j || over all i and j in { , , , . . . , n } .Chepoi et al. in [4] proved that seriation in the presence of errors is NP-Hard. Chepoiand Seston in [5] gave a factor 16 approximation algorithm for the same problem.Finally, Fortin in [8] surveyed the challenges for Robinsonian matrix recognition.Recognition of 0 − , itis impossible to have a sub-exponential time algorithm that determines if an incompletesymmetric matrix is Robinsonian or not. They also presented an exponential timealgorithm for the recognition of incomplete 0 − n × n incomplete Robinsonian matriceswith time complexity O ( n · n ) that drops the requirement for the matrix to be 0 − − Our Contributions
We consider that our first contribution is the introduction ofincomplete Strong-Robinson and Strong-Robinsonian matrices. Then, in Lemma 3we show the existence of a Robinsonian matrix that is not Strong-Robinsonian. InTheorem 1, we show that recognition of incomplete Strong-Robinsonian matrices isNP-Complete. Finally, in Theorem 2, we show that recognition of incomplete Strong-Robinsonian matrices is in XP. Indeed, we provide a parameterized algorithm for incom-plete Strong-Robinsonian matrix recognition that is exponential only in the number ofmissing entries of the matrix while polynomial in the size of the matrix and the numberof different values it has.
In this section, we study the recognition problem for incomplete Strong-Robinsonianmatrices. We start introducing some concepts. A completion of an incomplete sym-metric matrix A is an assignment of values to all the missing entries of A . We say that a completion of A is Robinsonian if the completed matrix is Robinsonian. Let S ⊆ R The Exponential Time Hypothesis states that there exists a constant
C > O (2 CN ) exists, where N denotes the number of variables in theinput formula.
4e a set of real values. A completion of A whose new values are taken from S is saidto be a completion of A in S . Lemma 1.
Let A be an incomplete symmetric matrix. Then A is Strong-Robinsonianif and only if A has a Robinsonian completion.Proof. Let A be an incomplete symmetric matrix. If A has a Robinsonian completion,we first complete A according to that completion. Then, we write the completion of A in its Robinson form. Finally, we delete all the added entries. The outcome of thisprocess is A written as an incomplete Strong-Robinson matrix.On the other hand, if A is Strong-Robinsonian, we first write A in its Strong-Robinson form. Then, a Robinsonian completion for A is constructed as follows. Forevery missing entry in the main diagonal, we define a i,i := max w ( A ). Then, thecompletion continues filling each diagonal moving away from the main diagonal, i. e.,increasing the parameter k = j − i for missing entries a i,j such that 1 ≤ i < j ≤ n . Forevery missing entry a i,j , we define a i,j := min { a i,j − , a i +1 ,j } . Finally, by constructionthe completion is Robinson.A direct consequence from the previous lemma is the following lemma. Lemma 2.
Let A be an incomplete symmetric matrix, and w ( A ) be the set of differentvalues in A . Then A has a Robinsonian completion if and only if A has a Robinsoniancompletion in w ( A ) .Proof. In one direction, the lemma is direct (if A has a Robinsonian completion in w ( A ), it has a Robinsonian completion). In the opposite direction, due to Lemma 1,if A has a Robinsonian completion, then A is Robinson. Therefore, we can write it asa Robinson matrix, and then proceed with the completion described in the proof ofLemma 1 which provides a Robinsonian completion in w ( A ).In the next lemma we present a Strong-Robinsonian incomplete matrix that is notRobinsonian. Lemma 3.
There exists an incomplete Robinsonian symmetric matrix which is notStrong-Robinsonian.Proof.
We demonstrate this lemma by giving an incomplete Robinsonian matrix whichis not Strong-Robinsonian. Consider the following matrix: ∗ ∗ ∗ ∗ . (1)5atrix (1) is Robinson, but we shall see that it is not Strong-Robinsonian. ByLemma 1, matrix (1) is Strong-Robinsonian if and only if it has a Robinsonian com-pletion. By Lemma 2, matrix (1) has a Robinsonian completion if and only if it has acompletion with 0’s and 1’s. Matrix (1) has four possible completions with 0’s and 1’s.Two of them generate the same case reordering the columns and rows. Therefore, weanalyze three different completions. The three completions are: , , . Picking the right columns and corresponding rows for each of these matrices, wefind the following matrix as a sub matrix: (cid:18) (cid:19) . (2)According to the characterization of matrices of interval and proper interval graphsgiven by Mertzios in [18], matrix (2) is not the augmented adjacency matrix of aproper interval graph. Due to the result shown in [24] by Roberts, a matrix with 0 − Theorem 1.
Strong-Robinsonian matrix recognition is NP-Complete.Proof.
We show the NP-Completeness of this recognition problem via a particular case.Indeed, we will show that Strong-Robinsonian 0 − − A is Strong-Robinsonian if and only if A has a Robinsonian completion with0’s and 1’s. On the other hand, in [6] it was shown that a complete symmetric 0 − − Theorem 2.
Let A be an incomplete symmetric matrix with b missing entries and | w ( A ) | different values. Then, it is possible to decide if A is Strong-Robinsonian intime O ( | w ( A ) | b n ) . roof. Let A be an incomplete symmetric matrix with b missing entries and | w ( A ) | different values. There exist | w ( A ) | b different completions of A with values in w ( A ).Therefore, an exhaustive search over all the completions of A with values in w ( A ), andtesting for each of them the Robinsonian property (for complete matrices) can be donein time O ( | w ( A ) | b n ). Robinsonian matrices are important in different contexts. Recognition of Robinsonianmatrices can be done efficiently when the matrix has no missing entry. When entries ofthe matrix are missing, Robinsonian matrix recognition can be done in O ( n · n ) time.Unfortunately, it has been shown that there is no sub-exponential time algorithm forthe Robinsonian (incomplete) matrix recognition problem.In this document, we proved that a subset of the set of incomplete Robinsonianmatrices, the Strong-Robinsonian matrices, can be recognized in polynomial time withrespect to the size of the matrix when the number of missing entries is a constantparameter. Indeed, we presented an algorithm that recognizes incomplete Strong-Robinsonian matrices in time O ( | w | b n ), where | w | denotes the number of differentvalues in the matrix and b denotes the number of missing entries. References [1]
Aracena, J., and Thraves Caro, C.
The weighted sitting closer to friendsthan enemies problem in the line. arXiv preprint arXiv:1906.11812 (2019).[2]
Brusco, M. J., and Stahl, S.
Branch-and-Bound applications in combinatorialdata analysis . Springer Science & Business Media, 2006.[3]
Chepoi, V., and Fichet, B.
Recognition of Robinsonian dissimilarities.
Journalof Classification 14 , 2 (1997), 311–325.[4]
Chepoi, V., Fichet, B., and Seston, M.
Seriation in the presence of errors:NP-hardness of l ∞ -fitting Robinson structures to dissimilarity matrices. Journalof Classification 26 , 3 (2009), 279–296.[5]
Chepoi, V., and Seston, M.
Seriation in the presence of errors: A factor 16approximation algorithm for l ∞ -fitting Robinson structures to distances. Algorith-mica 59 , 4 (2011), 521–568.[6]
Cygan, M., Pilipczuk, M., Pilipczuk, M., and Wojtaszczyk, J. O.
Sittingcloser to friends than enemies, revisited.
Theory of Computing Systems 56 , 2(2015), 394–405.[7]
Ding, C., and He, X.
Linearized cluster assignment via spectral ordering. In
Pro-ceedings of the twenty-first international conference on Machine learning (2004),ACM, p. 30. 78]
Fortin, D.
Robinsonian matrices: Recognition challenges.
Journal of Classifica-tion 34 , 2 (2017), 191–222.[9]
Fortin, D.
Clustering analysis of a dissimilarity: a review of algebraic andgeometric representation.
Journal of Classification (2019), 1–23.[10]
Golumbic, M. C.
Matrix sandwich problems.
Linear Algebra and its Applications277 , 1 (1998), 239 – 251.[11]
Golumbic, M. C., Kaplan, H., and Shamir, R.
On the complexity of dnaphysical mapping.
Advances in Applied Mathematics 15 , 3 (1994), 251 – 261.[12]
Hubert, L., Arabie, P., and Meulman, J.
Combinatorial data analysis: Op-timization by dynamic programming , vol. 6. SIAM, 2001.[13]
Laurent, M., and Seminaroti, M.
The quadratic assignment problem is easyfor robinsonian matrices with toeplitz structure.
Operations Research Letters 43 ,1 (2015), 103–109.[14]
Laurent, M., and Seminaroti, M.
A Lex-BFS-based recognition algorithmfor Robinsonian matrices.
Discrete Applied Mathematics 222 (2017), 151–165.[15]
Laurent, M., and Seminaroti, M.
Similarity-first search: A new algorithmwith application to Robinsonian matrix recognition.
SIAM Journal on DiscreteMathematics 31 , 3 (2017), 1765–1800.[16]
Laurent, M., Seminaroti, M., and Tanigawa, S.-i.
A structural character-ization for certifying Robinsonian matrices.
The Electronic Journal of Combina-torics 24(2) (2017).[17]
Liiv, I.
Seriation and matrix reordering methods: An historical overview.
Sta-tistical Analysis and Data Mining: The ASA Data Science Journal 3 , 2 (2010),70–91.[18]
Mertzios, G. B.
A matrix characterization of interval and proper interval graphs.
Applied Mathematics Letters 21 , 4 (2008), 332–337.[19]
Mirkin, B. G., and Rodin, S. N.
Graphs and Genes . Springer-Verlag, 1984.[20]
Pardo, E. G., Garc´ıa-S´anchez, A., Sevaux, M., and Duarte, A.
Ba-sic variable neighborhood search for the minimum sitting arrangement problem.
Journal of Heuristics 26 , 2 (2020), 249–268.[21]
Pardo, E. G., Soto, M., and Thraves Caro, C.
Embedding signed graphsin the line.
Journal of Combinatorial Optimization 29 , 2 (2015), 451–471.[22]
Petrie, W. M. F.
Sequences in prehistoric remains.
Journal of the Anthropolog-ical Institute of Great Britain and Ireland (1899), 295–301.823]
Pr´ea, P., and Fortin, D.
An optimal algorithm to recognize Robinsoniandissimilarities.
Journal of Classification 31 , 3 (2014), 351–385.[24]
Roberts, F. S.
Indifference graphs. In
Proof Techniques in Graph Theory ,F. Harary, Ed. Academic Press, New York, 1969, pp. 139–146.[25]
Robinson, W. S.
A method for chronologically ordering archaeological deposits.
American Antiquity 16 , 4 (1951), 293 – 301.[26]
Tien, Y.-J., Lee, Y.-S., Wu, H.-M., and Chen, C.-H.
Methods for simul-taneously identifying coherent local clusters with smooth global patterns in geneexpression profiles.