Multiple Criss-Cross Insertion and Deletion Correcting Codes
Lorenz Welter, Rawad Bitar, Antonia Wachter-Zeh, Eitan Yaakobi
aa r X i v : . [ c s . I T ] F e b Multiple Criss-Cross Deletion-Correcting Codes
Lorenz Welter , Rawad Bitar , Antonia Wachter-Zeh , and
Eitan Yaakobi
Abstract —This paper investigates the problem of correctingmultiple criss-cross deletions in arrays. More precisely, we studythe unique recovery of n × n arrays affected by any combina-tion of t r row and t c column deletions such that t r + t c = t for agiven t . We refer to these type of deletions as t -criss-cross dele-tions . We show that a code capable of correcting t -criss-crossdeletions has redundancy at least tn + t log n − log( t !) . Then,we present an existential construction of a code capable of cor-recting t -criss-cross deletions where its redundancy is boundedfrom above by tn + O ( t log n ) . The main ingredients of thepresented code are systematic binary t -deletion correcting codesand Gabidulin codes. The first ingredient helps locating the in-dices of the deleted rows and columns, thus transforming thedeletion-correction problem into an erasure-correction problemwhich is then solved using the second ingredient. I. I
NTRODUCTION
Deletion-correcting codes have recently witnessed an in-creased attention due to their application in DNA-basedstorage systems, file synchronization, and communica-tion systems [1]–[6]. The problem of correcting deletionsdates back to the 1960s. In [7], Levenshtein defined thenotion of t -deletion-correcting codes and bounded frombelow the redundancy of any binary t -deletion-correctingcode by t log n − O (1) . Moreover, he proved that theVarshamov-Tenengolts codes [8], originally designed to cor-rect a single asymmetric error, can also correct a singledeletion and have redundancy of roughly log( n + 1) bits.Several recent works studied the problem of constructing bi-nary t -deletion-correcting codes, for t > , with redundancyapproaching Levenshtein’s bound [9]–[15]. Of particular im-portance to us is the work of Sima et al. [16] in which theauthors present a binary systematic t -deletion-correcting codewith redundancy t log( n ) + o log( n ) .This paper considers the problem of coding for deletions inthe two-dimensional space. Given a certain number of dele-tions t and an array X , we assume that the array can be af-fected by any combination of t r row and t c column deletionssuch that t r + t c = t . This type of deletions are referred to as t -criss-cross deletions . Our goal is to construct codes that canuniquely recover the array X from any t -criss-cross deletionand we refer to these codes as t -criss-cross codes . We borrowthis terminology from previous works that studied the problemof correcting criss-cross erasures and substitution errors in thetwo-dimensional space, e.g., [17]–[24]. The criss-cross dele-tion problem is however more involved than the criss-cross LW, RB and AW-Z are with the Institute for Communications Engineer-ing, Technical University of Munich (TUM), Germany. Emails: { rawad.bitar,lorenz.welter, antonia.wachter-zeh } @tum.de.EY is with the CS department of Technion — Israel Institute of Technology,Israel. Email: [email protected] project has received funding from the European Research Council(ERC) under the European Union’s Horizon 2020 research and innovationprogramme (grant agreement No. 801434) and from the Technical Universityof Munich - Institute for Advanced Studies, funded by the German Excel-lence Initiative and European Union Seventh Framework Programme underGrant Agreement No. 291763. erasure or substitution problem due to the loss of synchro-nization in the locations of the rows and the columns.The first works to study this problem were [25] and [26].In [25], we investigated the problem of correcting exactly onerow and one column deletion in arrays. We showed that theredundancy for this special case is bounded from below by n + 2 log n − O (1) and presented an existential and an ex-plicit construction with redundancy approximately n and n far from the lower bound, respectively. In [26], Hagi-wara constructed codes for the problem of correcting criss-cross deletions with at most t r row deletions and at most t c column deletions, for given values of t r and t c . His construc-tion splits the array into locators and information part. The lo-cators are carefully structured arrays that can exactly recoverthe index of any deleted rows and columns in the array. Then,a tensor-product erasure-correcting code is used to recover thelost symbols in the information part.Our contributions can be summarized as follows. Wepresent an asymptotic upper bound (in the code length) onthe cardinality of t -criss-cross codes. Our bound implies thatthe redundancy of any t -criss-cross code is bounded frombelow by approximately tn + t log n . Then, we constructexistential t -criss-cross codes based on locator arrays, bi-nary systematic t -deletion correcting codes, and Gabidulincodes. The main improvement is to use a collection of binarydeletion-correcting codes to locate the indices of the deletedcolumns and rows with less redundancy as compared to thelocator arrays used in [26]. However, small locator arraysare still needed to complement the deletion-correcting codes.Then, the deletion-correction problem is transformed into arow/column erasure-correction problem which can be solvedby using Gabidulin codes that have optimal redundancy forrow/column erasure-correction [18]. The redundancy of thepresented construction is tn + O ( t log n ) .II. D EFINITIONS AND P RELIMINARIES
This section formally defines the codes and notations thatare used throughout this paper. Let Σ , { , } be the binaryalphabet. We denote by Σ n × n the set of all binary arrays ofdimension n × n . All logarithms are base unless otherwiseindicated.For an integer n ∈ N , the set { , . . . , n } is denoted by [ n ] .For an array X ∈ Σ n × n and i, j ∈ [ n ] , we refer to the entryof X positioned at the i th row and the j th column by X i,j .We denote the i th row and the j th column of X by X i, [ n ] and X [ n ] ,j , respectively. Similarly, we denote by X [ i : i ] , [ j : j ] thesubarray of X formed by rows i to i and their correspondingentries from columns j to j . Moreover, for two arrays X ∈ Σ n × m and Y ∈ Σ n × m we denote by Z = ( X | Y ) theconcatenation of these two arrays with Z ∈ Σ n × ( m + m ) . Forany binary array X , we refer to the complement of X , i.e.,every bit in X is flipped, by X . For a positive integer t , we define a t -criss-cross deletion ina binary array X to be the deletion of any combination of t r rows and t c columns of X such that t r + t c = t . We refer to e X as the array resulting from a t -criss-cross deletion in X , wherethe number of deletions that happened in X is clear from thecontext. A code C ⊆ Σ n × n that can correct any t -criss-crossdeletion is called a t -criss-cross deletion-correcting code . Weabbreviate this code as t -criss-cross code . Throughout this pa-per we assume that t is a constant with respect to n . We write f ( n ) ≈ g ( n ) , f ( n ) . g ( n ) , and f ( n ) & g ( n ) if the equalityor inequality holds for n → ∞ .III. U PPER B OUND ON THE CARDINALITY
This section presents an asymptotic upper bound on thecardinality of any t -criss-cross code. This bound implies anasymptotic lower bound on the redundancy of any binary t -criss-cross code, denoted by R B ( n, t ) . Lemma 1
Any upper bound on the cardinality of a q -ary t -deletion-correcting code C q,n,t with q = 2 n is also an upperbound on the cardinality of a binary t -criss-cross code.Proof: Note that a n -ary t -deletion-correcting code C n ,n,t can be seen also as a binary t column deletion-correcting code by interpreting the symbols as binarycolumns. Since a t -criss-cross code C can correct any combi-nation of t r row and t c column deletions such that t r + t c = t ,in particular it can also correct any t column deletions. There-fore, any upper bound on the size of C n ,n,t is also a validupper bound on the size of C . Corollary 2
For any binary t -criss-cross code C it holds that |C| . t !2 n (2 n − t n t . Consequently, we have R B ( n, t ) & tn + t log( n ) − log( t !) .Proof: From [27], we have for a q -ary t -deletion correct-ing code that |C q,n,t | . t ! q n ( q − t n t . Even though this bound wasproved in [27] when q is fixed, one can verify that it holdstrue also for q = 2 n . Therefore, for any binary t -criss-crosscode C it holds by Lemma 1 that |C| |C n ,n,t | . t !2 n (2 n − t n t . Therefore, we have R B ( n, t ) & n − log( |C| ) ≈ tn + t log( n ) − log( t !) . IV. C
ODE C ONSTRUCTION
In this section we present an existential construction of t -criss-cross codes. We start with an intuitive road map to ourcode construction and then formally define each ingredient. A. Road Map
Our construction uses structured arrays so that the indicesof the deleted rows and columns can be exactly recovered.Then, the set of structured arrays is intersected with arraysof a Gabidulin code (that can correct row/column erasures) torecover the arrays of the code. The structure is depicted inFigure 1.We structure the n × n codewords C as follows. We protectthe columns with indices between t log n + 1 and n − ( t + 1) using t log n codes where each one is a binary systematic t -deletion-correcting code. We divide those codes into t blockseach of size log n . We impose what we call a window con-straint on the columns of the systematic part of every block.This constraint ensures that every t + 1 consecutive columnsare different. Therefore, the indices of the deleted columnswithin the systematic part can be located by using all log n deletion-correcting codes of any block (Claim 4).In the redundancy part, runs may exist. Thus, the recoveryof the index of the deleted columns is only guaranteed within L (1) T (1) M (2 , M (2 , L (2) T (2) M (2 , M (2 , deletion-correcting codes withwindow constraint t log( n ) r w ( t + 1) t log( n ) ( t + 1) log( n ) Fig. 1: Illustration of an array contained in the locator set L t ( n ) for t = 3 . In the first t log( n ) rows there are t blockseach consisting of a systematic part (cyan) and a redundancypart (red). Each row is encoded using a systematic t -deletion-correcting code (zoomed in part). In addition, in the systematicpart of each block a window constrained is imposed. Thoseblocks are used to locate column deletions. This structure isprotected with the arrays L (1) (blue) against row deletions and T (1) (brown) against column deletions. Lastly, to locate theborders of T (1) we use the marker arrays M (2 , and M (2 , (pink). A symmetric structure locates row deletions. possible runs. To recover the exact location of the deletedcolumns here, we protect the redundancy part of the codes byappending (from below) what we call a locator array that candetect the exact positions of column deletions within this part.We call this array T (1) (Claim 3).Note that for the window constraint to work, we need tohave all log n deletion-correcting codes of the consideredblock. Therefore, we use the subarray C [1: t log n ] , [ n − ( t +1) : n ] as a locator array L (1) that can detect the exact position of adeleted row within the first t log n rows (Claim 3). As a re-sult, if all t deletions are row deletions within the first t log n rows, then the locator array is enough to recover all the in-dices (Lemma 6). Otherwise, we have at least one block ofthe t blocks that is not affected by a row deletion. This wouldbe the block that will be used to recover any column dele-tion within the range between t log n + 1 and n − ( t + 1) (Lemma 7).One more step is needed. We must be able to locate theposition of the locator arrays within the resulting ( n − t r ) × ( n − t c ) array e C . Therefore, we put four marker arrays afterthe locators that are detectable even after t deletions. We callthose arrays M (1 , and M (1 , .The same structure (transposed) is used to index the rows.In addition, the columns with indices between and t log n areprotected by the locator array used for protecting the deletion-correcting codes indexing the rows. Note the claims and lem-mas mentioned before also include the statements to recoverthe row indices.In the next subsections, we formally define the five mainingredients of our code: (i) the locator arrays; (ii) the binarysystematic t -deletion-correcting codes with windows con-straints; (iii) the marker arrays; (iv) the locator set which isthe combination of all the previously mentioned parts; and (v) a Gabidulin code [28] that is used to correct row/columnerasures. B. Locator Arrays
For a positive integer a , we denote by I a the identity arrayof dimension a × a and by a and a the all-one vector andall-zero vector of length a , respectively. We use ⊗ to indicatethe Kronecker product. We thus have the following definitionfrom [26]. Definition 1 (Locator arrays)
We set L ′ ∈ Σ ( t +1) × ( t +1) as L ′ , I t +1 ⊗ t +1 . More precisely, L ′ has the following struc-ture L ′ = t +1 t +1 . . . t +1 t +1 t +1 . . . t +1 ... ... . . . ... t +1 t +1 . . . t +1 . Let s be a multiple of ( t + 1) such that s > ( t + 1) . Wedefine the locator array L s ∈ Σ s × ( t +1) as L s , T st +1 ⊗ L ′ . Moreover, we define the locator array T s ∈ Σ ( t +1) × s to bethe transpose of L s , i.e., T s , L Ts = st +1 ⊗ L ′ T . Throughout the paper we drop s in the notation L s and T s when the value of s is clear from the context. Claim 3
Given an array L s affected by t r row and t c columndeletions such that t r + t c = t . Divide L s into ( t + 1) subar-rays each consisting of ( t + 1) consecutive columns of L s . Byexamining e L s , we can locate the exact positions of the deletedrows. We can also determine the number of column deletionsthat happened in each subarray of L s .Given an array T s affected by t r row and t c column dele-tions such that t r + t c = t . The same as described above for L s can be done by switching rows for columns in the previousstatement.Proof: We prove the first part of the claim, the secondpart follows similarly since T s = L Ts .By construction of L s , for any i ∈ [ s − t − and j ∈ [ t + 1] it holds that L i, [( t +1) ] = L i + j, [( t +1) ] . This property holdstrue even in the presence of at most t column deletions in L s . Thus, due to the fixed structure of L s one can uniquelydetermine the exact indices of the deleted rows.Moreover, we divide L s in subarrays consisting of ( t + 1) columns. For any a, b, c ∈ [ t + 1] we have L [ s ] , ( c − a = L [ s ] , ( c − a + b . In words, we have ( t + 1) iden-tical columns in a subarray. This property holds true evenif there were at most t row deletions in L s . Therefore, wecan determine the deleted columns within any subarray bycounting the number of missing columns. C. Deletion-Correcting Codes with Window ConstraintsDeletion-correcting codes:
We use the construction of [16]for our binary systematic t -deletion correcting code. We brieflyrecall the results of [16]. Given a sequence k ∈ Σ κ , one cancompute a redundancy vector r k ∈ Σ ρ κ with ρ κ t log( κ )+ o (log( κ )) . The resulting sequence ( k | r k ) can be uniquely re-covered after t deletions. Note that r k is a function of theinformation k and ρ κ is a function of the information length κ and the number of deletions t . Window constraint:
We define the window constraint as theset W t ( ℓ, w ) ⊆ Σ ℓ × w , where for any W ∈ W t ( ℓ, w ) , i ∈ [ w − t ] and j ∈ [ t ] , it holds that W [ ℓ ] ,i = W [ ℓ ] ,i + j .For an array W ∈ W t ( ℓ, w ) , let R W ∈ Σ ℓ × r w be the ar-ray formed such that for any i ∈ [ ℓ ] the i th row of R W is theredundancy vector corresponding to the i th row of W ; com-puted using the construction in [16]. We refer to the array R W ∈ Σ ℓ × r w as the redundancy array. Let m , w + r w , wedefine D (1) t ( ℓ, m ) as the set of all arrays resulting from theconcatenation of W and R W , i.e., D (1) t ( ℓ, m ) , ( D ∈ Σ ℓ × m : D = ( W | R W ) , s . t . W ∈ W t ( ℓ, w ) ) . In words, D (1) t ( ℓ, m ) is the set of binary systematic t -deletion-correcting codes in which the systematic part satisfies the imposed window constraint. This set will be used to indexthe columns of our arrays in the constructed code. We define D (2) t ( ℓ, m ) , n D T : D ∈ D (1) t ( ℓ, m ) o . This set is going tobe used for indexing the rows.
Claim 4
Given an array D = ( W | R W ) ∈ D (1) t ( ℓ, m ) af-fected by t column deletions and no row deletions, we canlocate the exact positions of the deleted columns in the sub-array W .The same holds for any array in D (2) t ( ℓ, m ) by switchingin the argument rows and columns.Proof: Assume e D = ( f W | e R W ) is the array obtainedafter the deletions. For each row in f W we can use the cor-responding redundancy in e R W to correct the deletions thathappened in this row [16]. We start by looking at the positionof the first recovered bit in each row. In each row, this positionmay be unique or may be in an interval of possible positions(run). The exact location of the column is then determined bythe unique position in which all runs (of all rows) intersect.The intersection is guaranteed to be unique by the imposedwindow constraint; since for any i ∈ [ w − t ] , and j ∈ [ t ] , itholds that W [ ℓ ] ,i = W [ ℓ ] ,i + j . This process is repeated for allrecovered bits until all t positions are determined.A similar argument follows for the second statement of theclaim. D. Marker Arrays
We define the following arrays of dimension ( t +1) × ( t +1) which will operate as markers to locate the position of the lo-cator arrays in the resulting e C . Recall that we use four locatorarrays in our construction, namely L (1) , L (2) , T (1) , and T (2) ,cf. Figure 1. We only need marker arrays for T (1) and L (2) .The position of L (1) and T (2) can be then determined. Thefirst marker array M (2 , , put on top of L (2) , consists of thefirst t + 1 columns of L ′ . The second marker array M (2 , , puton the right of L (2) , consists of the complement of the last t + 1 columns of L ′ . The marker arrays M (1 , and M (1 , are the transpose of M (2 , and M (2 , , respectively. E. Locator Set
We formally define the sets of arrays in Σ n × n that formour code. Let X ∈ Σ n × n , we start with the set of arraysthat are used to index the columns. This set is denoted by H t ( ℓ, n ) . The arrays in this set have the first tℓ columns di-vided into t blocks. The columns whose indices are between tℓ + 1 and n − ( t + 1) of each row consist of a systematic t -deletion-correcting code in which the systematic part satisfiesthe window constraint. We can write H t ( ℓ, n ) , ( X : X [( a − ℓ +1: aℓ ] , [ tℓ +1: n − ( t +1) ] ∈ D (1) t ( ℓ, n − tℓ − ( t + 1) ) ∀ a ∈ [ t ] ) . The set of arrays V t ( ℓ, n ) that are used to index the rows isdefined similarly to H t ( ℓ, n ) by replacing columns with rows. V t ( ℓ, n ) , ( X : X [ tℓ +1: n − ( t +1) ] , [( b − ℓ +1: bℓ ] ∈ D (2) t ( ℓ, n − tℓ − ( t + 1) ) ∀ b ∈ [ t ] ) . For a value of r w that divides t + 1 , the set of arrays E t ( ℓ, n ) that contains the locator arrays in the positions shownin Figure 1 is defined as follows. E t ( ℓ, n ) , X : X [1: tℓ ] , [ n − ( t +1) +1: n ] = L tℓ , X [ n − r w − ( t +1) +1: n ] , [ tℓ +1: tℓ +( t +1) ] = T r w +( t +1) , X [ n − ( t +1) +1: n ] , [1: tℓ ] = T tℓ , X [ tℓ +1: tℓ +( t +1) ] , [ n − r w − ( t +1) +1: n ] = L r w +( t +1) , . The set of arrays that contains the marker arrays in the po-sitions shown in Figure 1, is defined as follows. M t ( ℓ, n ) , X : X [ tℓ +1: tℓ +( t +1)] , [ n − r w − ( t +1) − ( t +1)+1: n − r w − ( t +1) ] = M (1 , , X [ tℓ +( t +1) +1: tℓ +( t +1) +( t +1)] , [ n − ( t +1)+1: n ] = M (1 , , X [ n − r w − ( t +1) − ( t +1)+1: n − r w − ( t +1) ] , [ tℓ +1: tℓ +( t +1)] = M (2 , , X [ n − ( t +1)+1: n ] , [ tℓ +( t +1) +1: tℓ +( t +1) + ( t +1)] = M (2 , . We can conclude this subsection by defining the locator set that is the set of all arrays that have the structure required byour code to recover the indices of the deleted columns androws. The locator set is the intersection of all the previouslydefined sets.
Definition 2 (Locator Set)
We define the following set: L t ( n ) , H t ( ℓ, n ) ∩ V t ( ℓ, n ) ∩ E t ( ℓ, n ) ∩ M t ( ℓ, n ) . For an illustration of such arrays we refer to Figure 1. Thedefining parameters of L t ( n ) are only t and n . By fixing those,all other parameters can be obtained from the imposed con-straints. Most noteworthy parameters are w and r w , which arefunctions of n and t . F. Construction
We write C Gab ( n, t ) to refer to a linear Gabidulin codewhich is able to correct any pattern of t r row and t c columnerasures in an n × n array as long as t r + t c = t [18]. Nowwe are able to present our existential construction. Construction 1
The code C t,n ⊆ Σ n × n is the set of arraysthat belong to L t ( n ) ∩ C Gab ( n, t ) . Theorem 5
The code C t,n described in Construction is a t -criss-cross code. A rough concept of our construction is as follows. In ourcodewords, we first introduce the structure L t ( n ) to locate the If the value of r w does not divide t + 1 , then one can simply expand thedimension of the locator arrays in E t ( ℓ, n ) to the next multiple of t + 1 thatis greater than r w + ( t + 1) . indices of the deleted columns and rows. With this knowl-edge we can introduce erasures into the missing rows andcolumns and convert the deletion problem into an erasure prob-lem which can be solved by the Gabidulin code C Gab ( n, t ) [18]. We call this type of decoding the locate-decode strategy .Theorem 5 will be proven by providing a generic decodingstrategy in the next section.V. D ECODER
Assume we have a codeword C ∈ C t,n . The decoder re-ceives an array e C ∈ Σ ( n − t r )( n − t c ) obtained from C by t r row and t c column deletions. Let us denote the set of indicesof the rows and columns that got deleted by I ( t r ) ⊂ [ n ] and I ( t c ) ⊂ [ n ] , respectively, with |I ( t r ) | + |I ( t c ) | = t r + t c = t .As mentioned before we first focus on locating the indices ofthe row and column deletions. Lemma 6
Given the array e C , any row index i ∈ I ( t r ) suchthat i tℓ or n − r w − ( t +1) < i n can be recovered.Similarly, any column index j ∈ I ( t c ) such that j tℓ or n − r w − ( t + 1) < j n can be recovered.Proof: We focus on how to recover the row indices i ∈I ( t r ) and the column indices j ∈ I ( t c ) that satisfy i tℓ and n − r w − ( t + 1) < j n . Recovering the remain-ing indices of the statement follows by the symmetry of theconstruction.It can be shown (and is omitted for brevity) that the bound-aries of e L (1) and e T (1) in e C can be exactly recovered by lever-aging the structure of L (1) , T (1) , and the imposed markers M (1 , and M (1 , . Therefore, by Claim 3 we can locate anycolumn deletions with indices n − r w − ( t + 1) < j n bydecoding e T (1) . Consequently, having the boundaries of e L (1) and using Claim 3, we can recover the indices of the deletedrows that satisfy i tℓ . Lemma 7
Given the array e C , any row index i ∈ I ( t r ) suchthat tℓ < i n − r w − ( t +1) can be recovered. Similarly, anycolumn index j ∈ I ( t c ) such that tℓ < j n − r w − ( t + 1) can be recovered.Proof: We start by proving that the column indices canbe recovered. We want to leverage the structure imposed bythe set H t ( ℓ, n ) . For an array C ∈ H t ( ℓ, n ) , each row of thesubarray C [1: tℓ ] , [ tℓ +1: n − ( t +1) ] is encoded using a binary sys-tematic t -deletion-correcting code. In addition, the columns C [1: tℓ ] ,j such that t < j n − r w − ( t + 1) are the system-atic part of this code. Recall that the rows are divided into t blocks, each of size ℓ , where in each block the columns t < j n − r w − ( t + 1) satisfy the window constraint.We assume that at least one column in this interval is deleted.Therefore, at most ( t − rows can be deleted in C . Thismeans, that there exists at least one block of ℓ rows that is notaffected by any row deletions. By Lemma 6 we can locate thisblock. By Claim 4 we can recover the indices of the columnsdeleted within the range t < j n − r w − ( t + 1) . Similarly,we can obtain the indices with tℓ < i n − r w − ( t + 1) by leveraging the structure imposed by V t ( ℓ, n ) , Lemma 6, andClaim 4.Now we can present the full proof of our code construction. Proof of Theorem : By applying Lemma 6 andLemma 7, we can determine the sets of indices I ( t r ) and I ( t c ) . For all i ∈ I ( t r ) and j ∈ I ( t c ) we insert row or columnerasure in e C starting from the smallest index. Now we canapply a Gabidulin criss-cross erasure decoder to determinethe values of the erased symbols [18].VI. R EDUNDANCY
In this section we perform an analysis of the redundancy ofour code denoted by R ( n, t ) . We will refer to the redundancyof each individual set C Gab ( t, n ) , L t ( n ) , H t ( ℓ, n ) , V t ( ℓ, n ) , E t ( ℓ, n ) , W t ( ℓ, w ) and M t ( ℓ, n ) by R ∗ ( n, t ) , where ∗ is re-placed with the corresponding set letter. In the following, wegive an intuition behind the computations of the redundancy.Since C t,n = L t ( n ) ∩ C Gab ( t, n ) and due to the fact thatthe Gabidulin code is a linear code, we can compute the coderedundancy as follows. R ( n, t ) = R L ( n, t ) + R G ( n, t ) Moreover, since the intersected sets in the locator set L t ( n ) impose constraints on disjoint positions in the n × n arrays,we can further split the redundancy as follows. R L ( n, t ) = R H ( n, t ) + R V ( n, t ) + R E ( n, t ) + R M ( n, t ) The sets H t ( ℓ, n ) and V t ( ℓ, n ) impose similar constraints: t disjoint subarrays constrained with the window constraintwhere each row is protected by a systematic t -deletioncorrecting code from [16]. Claim 8
The redundancy resulting from the constraints im-posed by the two sets H t ( ℓ, n ) and V t ( ℓ, n ) is bounded as R H ( n, t ) + R V ( n, t ) t ( R W ( ℓ, w ) + log( n ) · r w ) , where w = n − t log( n ) − r w − ( t + 1) and r w t log( n ) + o (log n ) .Proof: The sets H t ( ℓ, n ) and V t ( ℓ, n ) impose the sameconstraints, i.e., each array belonging to any of these sets has t subarrays protected by deletion-correcting codes with windowconstraints. Therefore, we have R H ( n, t ) + R V ( n, t ) = 2 t ( R W ( ℓ, w ) + log( n ) · r w ) , where r w is the length of the redundancy vector used to pro-tect a vector of length w = n − t log( n ) − r w − ( t + 1) (1)against t deletions, and log( n ) is the number of protected vec-tors in each subarray. Recall that for any integer κ , the re-dundancy for protecting a vector of length κ is bounded by ρ κ t log( κ ) + o (log( κ )) [16]. Thus, since w < n , we have r w (4 t + 1) log( n ) . (2)We now focus on computing R W ( ℓ, w ) . To compute anupper bound on the redundancy imposed by the window con-straint W t ( ℓ, w ) we require a lower bound on w . Note that using a lower bound on w only increases the redundancy im-posed by W t ( ℓ, w ) . This will be clear from the following cal-culations. From (1) and (2) we obtain w > n − (5 t + 1) log( n ) − ( t + 1) . We calculate a lower bound on |W t ( ℓ, w ) | . On a high level,our calculations are interpreted as going through each columnof an array in W t ( ℓ, w ) and counting the number of choicesfor this specific column. The first column is arbitrary, thushas ℓ choices. The second column is not allowed to be thesame as the one before, thus it has (2 ℓ − choices. The thirdcolumn has (2 ℓ − choices, since it cannot be the same as thetwo preceding columns. This process continues until we reachthe ( t + 2) nd column. The number of choices for this vector is (2 ℓ − ( t + 1)) . Since the restriction is imposed on an intervalof ( t + 1) vectors, each remaining columns has (2 ℓ − ( t + 1)) choices. Thus, for the window constraint the following holds. |W t ( ℓ, w ) | > ℓ · (2 ℓ − · . . . · (2 ℓ − ( t + 1)) · (2 ℓ − ( t + 1)) w − t − > (2 ℓ − ( t + 1)) w = 2 ℓw (cid:18) − t + 12 ℓ (cid:19) w . We denote the redundancy resulting from the constraints im-posed by the window constraint as R W ( ℓ, n − tℓ − r w ) . Wecontinue the calculations recalling that ℓ = log( n ) . R W ( ℓ, n − tℓ − r w ) ℓ ( n − tℓ − r w ) − log ( |W t ( ℓ, n − tℓ − r w ) | ) log (cid:18) − t + 12 ℓ (cid:19) n − tℓ − r w ! log (cid:18)(cid:18) − t + 1 n (cid:19) n (cid:19) − log (cid:18) − t + 1 n (cid:19) (5 t +1) log( n ) ! ( a ) log( e ( t +1) ) − (5 t + 1) log( n ) · log (cid:18)(cid:18) − t + 1 n (cid:19)(cid:19) ( b ) ( t + 1) log( e ) + (5 t + 1) log( n ) (5 t + 1) log( n ) + 2( t + 1) . We used in ( a ) the inequality (cid:0) − xn (cid:1) n e x and exploitedin ( b ) the fact that (1 − t +1 n ) for our choice ofparameters and for sufficiently large n .Recall that any array in D (1) t ( ℓ, n − tℓ − ( t +1) ) consists of log( n ) binary t -deletion correcting codes. Therefore, we have R H ( n, t ) t · (5 t + 1) log( n ) + 2( t + 1) | {z } window constraint + log( n ) · (4 t + 1) log( n ) | {z } binary deletion correcting codes = (4 t + t ) log ( n ) + (5 t + t ) log( n ) + 2 t ( t + 1) . Since the arrays in V t ( ℓ, n ) have a similar structure imposed(only transposed) and the regions of the imposed constraintsare disjoint, one can conclude that R H ( n, t ) + R V ( n, t ) t + t ) log ( n ) + 2(5 t + t ) log( n ) + 4 t ( t + 1) . Observe that the constraints for the remaining sets fix valuesfor certain subarray boundaries. Therefore, the following canbe obtained.
Claim 9
The redundancy R L ( n, t ) resulting from theconstraints imposed by the set L t ( n ) is bounded as R L ( n, t ) (8 t + 2 t ) log ( n ) + o (log ( n )) . Proof:
We argued that since the different constraints areimposed on disjoint subarrays in L t ( n ) , then the redundancy R L ( n, t ) can be written as R L ( n, t ) = R H ( n, t ) + R V ( n, t ) + R E ( n, t ) + R M ( n, t ) The redundancy imposed by the locator arrays and markerarrays is equal to the dimension of the subarrays with fixedentries. We can then write R E ( n, t ) (6 t + 13 t + 8 t + 1) log( n ) ,R M ( n, t ) = 4( t + 1) . The other terms of the redundancy in R L ( n, t ) are computedin Claim 8.We can conclude this section with the statement on the redun-dancy R ( n, t ) of the code C t,n presented in Construction 1.Note that the redundancy added by the Gabidulin code is tn . Lemma 10
The redundancy of the code C t,n is bounded as R ( n, t ) tn + (8 t + 2 t ) log ( n ) + o (log ( n )) . Proof:
By construction we have that C t,n = L t ( n ) ∩C Gab ( n, t ) . By Claim 9 we have that |L t ( n ) | > n n ((8 t +2 t )+ o (1)) log( n ) . From [18] we have that |C Gab ( n, t ) | = n tn . Further, due tothe fact that C Gab ( n, t ) is a linear code, there exists a cosetsuch that the following is satisfied by means of the pigeonhole principle. |C t,n | > n · tn |{z} Gabidulin Code · n ((8 t +2 t )+ o (1)) log( n ) | {z } Locator Set
Hence, we can conclude that the total redundancy of the C t,n satisfies R ( n, t ) = n − log( |C t,n | ) tn + (8 t + 2 t ) log ( n ) + o (log ( n )) . R EFERENCES[1] R. Heckel, G. Mikutis, and R. N. Grass, “A Characterization of theDNA Data Storage Channel,”
Scientific Reports , vol. 9, no. 1, p. 9663,2019. [Online]. Available: https://doi.org/10.1038/s41598-019-45832-6[2] F. Sala, C. Schoeny, N. Bitouz´e, and L. Dolecek, “Synchronizing filesfrom a large number of insertions and deletions,”
IEEE Transactions onCommunications , vol. 64, no. 6, pp. 2258–2273, June 2016.[3] R. Venkataramanan, H. Zhang, and K. Ramchandran, “Interactive low-complexity codes for synchronization from deletions and insertions,”in . IEEE, 2010, pp. 1412–1419.[4] S. S. T. Yazdi and L. Dolecek, “A deterministic polynomial-time proto-col for synchronizing from deletions,”
IEEE Transactions on InformationTheory , vol. 60, no. 1, pp. 397–409, 2013.[5] N. Ma, K. Ramchandran, and D. Tse, “Efficient file synchronization: Adistributed source coding approach,” in
IEEE International Symposiumon Information Theory Proceedings , 2011, pp. 583–587.[6] L. Dolecek and V. Anantharam, “Using Reed–Muller RM (1, m) codesover channels with synchronization and substitution errors,”
IEEE Trans-actions on Information Theory , vol. 53, no. 4, pp. 1430–1443, April2007.[7] V.I. Levenshtein, “Binary codes capable of correcting deletions, inser-tions and reversals (in Russian),”
Doklady Akademii Nauk SSR , vol. 163,no. 4, pp. 845–848, 1965.[8] R. R. Varshamov and G. M. Tenengolts, “Codes which correct singleasymmetric errors (in Russian),”
Automatika i Telemkhanika , vol. 161,no. 3, pp. 288–292, 1965.[9] V. Guruswami and C. Wang, “Deletion codes in the high-noise and high-rate regimes,”
IEEE Transactions on Information Theory , vol. 63, no. 4,pp. 1961–1970, Apr. 2017.[10] J. Brakensiek, V. Guruswami, and S. Zbarsky, “Efficient low-redundancycodes for correcting multiple deletions,”
IEEE Transactions on Informa-tion Theory , vol. 64, no. 5, pp. 3403–3410, 2017.[11] S. K. Hanna and S. El Rouayheb, “Guess & check codes for deletions,insertions, and synchronization,”
IEEE Transactions on Information The-ory , vol. 65, no. 1, pp. 3–15, 2018.[12] R. Gabrys and F. Sala, “Codes correcting two deletions,”
IEEE Trans-actions on Information Theory , vol. 65, no. 2, pp. 965–974, Feb 2019.[13] J. Sima, N. Raviv, and J. Bruck, “Two deletion correcting codes fromindicator vectors,”
IEEE Transactions on Information Theory , pp. 1–1,2019.[14] J. Sima and J. Bruck, “On optimal k-deletion correcting codes,”
IEEETransactions on Information Theory , pp. 1–1, 2020.[15] V. Guruswami and J. H ρ astad, “Explicit two-deletion codes with redun-dancy matching the existential bound,” arXiv preprint arXiv:2007.10592 ,2020.[16] J. Sima, R. Gabrys, and J. Bruck, “Optimal systematic t -deletion cor-recting codes,” in , 2020, pp. 769–774.[17] R. M. Roth, “Maximum-rank array codes and their application tocrisscross error correction,” IEEE Transactions on Information Theory ,vol. 37, no. 2, pp. 328–336, 1991.[18] E. M. Gabidulin and N. I. Pilipchuk, “Error and erasure correcting al-gorithms for rank codes,”
Designs, Codes and Cryptography , vol. 49,pp. 105–122, 2008.[19] D. Lund, E. M. Gabidulin, and B. Honary, “A new family of optimalcodes correcting term rank errors,” in
IEEE International Symposium onInformation Theory , June 2000, p. 115.[20] V. R. Sidorenko, “Class of correcting codes for errors with a latticeconfiguration,”
Problemy Reredachi Informatsii , vol. 12, no. 3, pp. 165–171, Mar. 1976.[21] M. Blaum and J. Bruck, “MDS array codes for correcting a single criss-cross error,”
IEEE Transactions on Information Theory , vol. 46, no. 3,pp. 1068–1077, May 2000.[22] E. M. Gabidulin, “Optimum codes correcting lattice errors,”
ProblemyPeredachi Informatsii , vol. 21, no. 2, pp. 103–108, 1985.[23] R. M. Roth, “Probabilistic crisscross error correction,”
IEEE Transac-tions on Information Theory , vol. 43, no. 5, pp. 1425–1438, Sep. 1997.[24] A. Wachter-Zeh, “List decoding of crisscross errors,”
IEEE Transactionson Information Theory , vol. 63, no. 1, pp. 142–149, 2017.[25] R. Bitar, I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Criss-cross deletion correcting codes,”
International Symposium on Informa-tion Theory and Its Applications , 2020.[26] M. Hagiwara, “Conversion method from erasure codes to multi-deletionerror-correcting codes for information in array design,”
InternationalSymposium on Information Theory and Its Applications (ISITA) , 2020. [27] A. A. Kulkarni and N. Kiyavash, “Nonasymptotic upper bounds fordeletion correcting codes,”
IEEE Transactions on Information Theory ,vol. 59, no. 8, pp. 5115–5130, Aug 2013.[28] E. M. Gabidulin, “Theory of codes with maximum rank distance,”