Bit Flipping Moment Balancing Schemes for Insertion, Deletion and Substitution Error Correction
11 Bit Flipping Moment Balancing Schemes forInsertion, Deletion and Substitution ErrorCorrection
Ling Cheng and Hendrik C. Ferreira
Abstract
In this paper, two moment balancing schemes, namely a variable index scheme and a fixed indexscheme, for either single insertion/deletion error correction or multiple substitution error correctionare introduced for coded sequences originally developed for correcting substitution errors only. Byjudiciously flipping bits of the original substitution error correcting code word, the resulting word isable to correct either a reduced number of substitution errors or a single insertion/deletion error. Thenumber of flips introduced by the two schemes can be kept small compared to the code length. Itshows a practical value of applying the schemes to a long substitution error correcting code for asevere channel where substitution errors dominate but insertion/deletion errors can occur with a lowprobability. The new schemes can be more easily implemented in an existing coding system than anypreviously published moment balancing templates since no additional parity bits are required which alsomeans the code rate remains same and the existing substitution error correcting decoder requires nochanges. Moreover, the work extends the class of Levenshtein codes capable of correcting either singlesubstitution or single insertion/deletion errors to codes capable of correcting either multiple substitutionerrors or single insertion/deletion error.
Index Terms
Insertion/deletion error correction, Moment function, Number theoretic code
L. Cheng is with the School of Electrical and Information Engineering, University of the Witwatersrand, Private Bag 3,Wits. 2050, Johannesburg, South Africa. (email: [email protected]). H. C. Ferreira is with the Department of Electrical andElectronic Engineering Science, University of Johannesburg, Auckland Park, 2006, South Africa. (e-mail: [email protected]). a r X i v : . [ c s . I T ] J a n I. I
NTRODUCTION
Synchronization errors at symbol level are defined as insertion and deletion errors. Duringa transmission, the event that an unknown symbol is put in at an unknown index is calledan insertion error and the event that at an unknown index an unknown symbol is left out iscalled a deletion error. The moment balancing template technique was investigated to correctinsertion/deletion errors [1]–[3]. In this paper, we further extend this early work to two newschemes, which can correct either substitution errors or a single insertion/deletion error.By using number theory, codes were invented to correct asymmetric errors, substitution errorsand insertion/deletion errors [4]–[11]. These codes were proposed to have a deterministic codeconstruction and a deterministic insertion/deletion error correcting capability. With ‘deterministic’we mean guaranteed correction of all specified error patterns, as opposed to the correction ofmost patterns with a high probability as in [12]. The difference between ‘deterministic’ and‘probabilistic’ has been well addressed in [11]. In this paper, we focus on one constructionproposed by Varshamov and Tenengolts [4], often called the Varshamov-Tenengolts (VT) con-struction. Levenshtein discovered that the same construction can be used to generate codes tocorrect single insertion/deletion error [5]. The single insertion/deletion error correcting codegenerated based on the VT construction is called the Levenshtein code. The relationship amonggroup theoretic codes, the VT construction and Levenshtein codes was investigated by Constantinand Rao [7]. Levenshtein codes with additional rules were found to be able to correct either asingle substitution error or a single insertion/deletion error [5]. In [6], a class of codes capable ofcorrecting a deletion and a prefix substitution error was presented. The VT construction has beenfurther implemented in single nonbinary insertion/deletion error correcting codes presented byTenengolts [8] and ST codes presented by Abdel-Ghaffar [10]. In [9], the high-order spectrum-null code construction published in [13] was found to be a subset of a Levenstein code. Helbergand Ferreira [14] presented a class of codes which can correct multiple insertion/deletion errorsbased on a construction which is a generalization of the Varshamov-Tenengolts construction[10], [11]. Dolecek and Anantharam [15], [16] presented a class of codes which can correctmultiple repetition errors based on some high-order moment conditions. Codes for adjacent andburst insertion/deletion error correction were investigated in [17]–[22]. Some recent results onmultiple insertion/deletion error correcting codes can be found in [23]. Since Schulman andZuckerman [24] presented the first asymptotically good construction of insertion/deletion error correcting codes, some recent results are presented in [25]–[33]. The ‘synchronization string’construction presented in [27]–[30], [33] can efficiently convert insertion/deletion errors intoerasures and substitution errors by using an index en/decoding method and then the erasures andsubstitution errors can be further corrected.In this paper we further investigate two new schemes which can convert a substitution errorcorrecting code word into a sequence which may also correct a single insertion/deletion errorbased on systematic encoding, the so-called moment balancing template [1]. When compared tothe systematic encoding in conventional error correcting codes, there is the similarity that theparity bits have fixed indices in the sequence, while the difference is that they are not alwaysat adjacent indices. By judiciously choosing the parity bits, the sequences generated by thisscheme can correct a single insertion/deletion error. The number of parity bits is small of thesame order as the number of parity bits in a Hamming code of comparable length. The ideabehind this scheme is to manipulate the first-order moment property of a sequence, which is alsothe key to the Varshamov-Tenengolts construction and the Levenshtein codes. Moment balancingtemplates for different types of sequences can be found in [1]–[3]. In the previous studies onmoment balancing templates, the template is composed of information bits and parity bits. Itis significant that in this paper, we present two new schemes for substitution error correctingcodes that require no additional parity bits to balance the moment of a sequence, thus reducingoverhead and also retaining the code rate with the trade-off of possibly reducing the substitutionerror correcting capability to some extent.The main contribution of this paper is to introduce two general schemes for the momentbalancing purpose, namely a variable index scheme and a fixed index scheme. Moreover, thespecial case of variable index schemes, the one-flip method can be considered as an extensionof Levenshtein codes that can correct single insertion/deletion or single substitution error. Basedon the special case, the lower bound of cardinalities of multiple substitution errors or singleinsertion/deletion error correcting codes can be therefore derived.The paper is organized as follows. In Section II, we review the VT construction and theinsertion/deletion/substitution error correcting codes and the moment balancing templates basedon this construction. The variable index scheme as well as its special case, one-flip scheme arepresented in Section III. The fixed bit flipping scheme is introduced in Section IV. The analysisand discussion of the two new schemes are presented in Section V. The paper is concluded inSection VI.
II. V
ARSHAMOV -T ENENGOLTS C ONSTRUCTION AND M OMENT B ALANCING T EMPLATES
A brief introduction to the VT construction and some different classes of error correctingcodes based on the VT construction follows.Let x = x x . . . x n denote a binary code word. Given a ∈ Z m for all x ∈ C a , the momentfunction of x is defined as σ ( x ) = n (cid:88) i =1 ix i ≡ a (mod m ) . (1) • If (cid:80) ni =1 c i x i ≡ a (mod 3) , where c i = 1 if i is odd and c i = 2 if i is even, C a is asubstitution-transposition correcting code [10]. • If m ≥ n + 1 , C a is a single insertion/deletion correcting code [5]. • If m ≥ n , C a is a single insertion/deletion or substitution correcting code [5]. • If m ≥ n − and (cid:80) ni =1 c i x i ≡ b (mod 2) , where b ∈ { , } , C a is a single insertion/deletionor prefixing substitution correcting code [6].Let α = α α . . . α n denote a binary sequence derived from x according to the relation rule α i = , if x i ≥ x i − , , if x i < x i − . Here α can be any binary symbol. Tenengolts presented two selection rules to construct anon-binary single insertion/deletion correcting code as follows. When (cid:80) ni =1 x i ≡ β (mod q ) , and (cid:80) ni =1 ( i − α i ≡ γ (mod n ) , for some fixed integers β and γ , C is a non-binary singleinsertion/deletion correcting code [8].Note that a code that can correct s deletions can also correct s insertions [5], where s ispositive integer.A brief introduction to the moment balancing template follows.Let C be a [ K, k ] binary code, which is not necessarily linear, of length K that has k codewords. Each code word c = ( c c · · · c K ) in the code C is mapped into a distinct sequence x = ( x x · · · x n ) whose moment is congruent to a fixed integer a modulo another fixed integer m . Similar to the systematic encoding for substitution error correcting codes, the mappings inwhich the code bits c c · · · c K appear in fixed indices in the sequence x , i.e. , c i = x γ ( i ) forsome ≤ γ (1) < γ (2) < · · · < γ ( K ) ≤ n , are achieved in a moment balancing template.The remaining bits in x , which are called balancing bits and denoted by b , b , . . . , b n − K , arepositioned in the n − K indices that are not occupied by code bits. In particular, b i = x β ( i ) where ≤ β (1) < β (2) < · · · < β ( n − K ) ≤ n and ∪ i γ ( i ) and ∪ i β ( i ) are disjoint sets whose union is { , , . . . , n } . Let σ c ( x ) = (cid:80) Ki =1 γ ( i ) c i (mod m ) and σ b ( x ) = (cid:80) n − Ki =1 β ( i ) b i (mod m ) . Then, σ c ( x ) and σ b ( x ) indicate the contribution of the code bits and the balancing bits, respectively,to the moment of x . In particular, σ c ( x ) + σ b ( x ) (mod m ) = σ ( x ) (mod m ) .III. V ARIABLE I NDEX B IT F LIPPING M OMENT B ALANCING S CHEME
We present a moment balancing scheme by flipping bits without adding extra balancing bits tobalance the moment value of a code word in order to enable a substitution error correcting codeword to correct a single insertion or deletion error. By flipping a bit whose index is unknownto the receiver, we artificially create a substitution error which can be corrected together withchannel errors. In our work, only the individual bits are flipped, which is different to a bitinversion operation based on the Knuth algorithm [34] for the dc-free balancing purpose, whichinvolves a specific bit and all the following bits. In this section, we will introduce in general avariable index bit flipping moment balancing scheme, namely multiple-flip moment balancing(MFMB) scheme in Section III-A, and a special variable index bit flipping moment balancingscheme, namely one-flip moment balancing (OFMB) scheme in Section III-B.
A. Multiple-Flip Moment Balancing Scheme
In the rest of the paper let C be a ( n, M, d min ) binary code, which is not necessarily linear, oflength n that has M code words and the minimum Hamming distance d min . We are interestedin mapping each code word c = ( c c · · · c n ) in the code C into a distinct binary sequence x = ( x x · · · x n ) whose moment is congruent to a fixed integer modulo another fixed integer.This is possible if and only if M is at most equal to the number of distinct sequences oflength n satisfying this congruence condition. In this paper, we focus on mappings in which byflipping a minimal number of bits in c to obtain x whose moment is congruent to a fixed integermodulo another fixed integer. Let d H ( · , · ) denote the Hamming distance of two sequences. Given c = ( c c · · · c n ) and a constant integer a ∈ { , , . . . , m − } where the constant integer m > n , x is generated on argument x = arg min σ ( x )= a d H ( c , x ) . We further define d as the maxima of allHamming distances of each c ∈ C and its corresponding x , i.e. , d = max ∀ c ∈C d H ( c , x ) . Lemma 1:
Let C be a ( n, M, d min ) code. If d min > d , all x ’s constitute a new ( n, M, d (cid:48) min ) single insertion/deletion error correcting code, where d (cid:48) min ≥ d min − d . Proof:
Since a maximum of d bit flipping operations for each code word in C have beencarried out, any two sequences x ’s generated from two different code words in C are distinct given d min > d . However, the minimum Hamming distance of the set of all x ’s is reduced to d (cid:48) min , where d (cid:48) min ≥ d min − d . Lemma 2:
For an arbitrary sequence c = ( c c · · · c n ) of length n , maximally (cid:98) log n (cid:99) + 1 bitsneed to be inverted to obtain x = ( x x · · · x n ) with σ ( x ) = a , where (cid:98) log n (cid:99) +1 ≥ m > n . Proof:
Given the fixed indices i ∈ { , , , . . . , (cid:98) log n (cid:99) } , when converting c into x byinverting some bits in these fixed indices, the obtained values of σ ( x ) take on all values from 0to m − [1].In Lemma 2, we present an upper bound of maximum number of bit flips for a code. Therefore,we have ≤ d ≤ (cid:98) log n (cid:99) + 1 .The following theorem illustrates the MFMB scheme. Theorem 1:
Let C be a ( n, M, d min ) code. Let a , d and m be three integers, where ≤ d ≤(cid:98) log n (cid:99) + 1 , d < d min , ≤ a < m and (cid:98) log n (cid:99) +1 ≥ m > n . (a) Any code word c in C can beturned into a distinct sequence x with σ ( x ) = a by flipping maximum d bits at unknown indices.(b) All distinct sequences constitute a new code that can correct a single insertion/deletion erroror at least (cid:98) d min − d − (cid:99) substitution errors. (c) If all possible bit flip indices are known, theresulting code can correct a single insertion/deletion error or at least (cid:98) d min − d − (cid:99) substitutionerrors. A single insertion/deletion error correcting code which also can correct no less than (cid:98) d min −(cid:98) log n (cid:99)− (cid:99) of substitution errors is guaranteed. Proof:
First, according to Lemma 2, maximum (cid:98) log n (cid:99) + 1 bit flips are required to satisfythe condition of σ ( x ) = a , although the possible indices of bit flips are fixed (known). Thenumber of necessary bit flips in variable (unknown) indices cannot be more than that of fixedcase. Second, since in a general case d bit flips can appear at any unknown indices, accordingto Lemma 1, the resulting code can correct at least (cid:98) d min − d − (cid:99) substitution errors. Furthermore,the resulting code satisfies the condition of σ ( x ) = a and can correct a single insertion/deletionerror. Third, if the possible bit flip indices are known, the errors at the unknown indices canbe considered as erasures. Therefore, the resulting code can correct a single insertion/deletionerror or at least (cid:98) d min − d − (cid:99) substitution errors. Since d = (cid:98) log n (cid:99) + 1 bit flips at known indicesare sufficient, the number of substitution errors can be corrected by the resulting code is lowerbounded by (cid:98) d min −(cid:98) log n (cid:99)− (cid:99) . Example 1:
For simplicity we start with a code with d min = 3 and convert it into a codecorrecting single insertion/deletion error. Let a = 0 . Choose C as a (7, 16, 3) Hamming code(the element after the code word in each row is the modulo value and the support set in each row shows the indices of inverting bits) and m = n + 1 = 8 . Let S = { i : c i (cid:54) = x i } be thesupport set to include all indices of the inverted bits in c . It is evident that | S | = d H ( c , x ) . TABLE IB
IT FLIPPING MOMENT BALANCING TEMPLATE OF A (7, 16, 3) H
AMMING CODE WITH m = 8 Code word σ ( c ) S {} {} {} { } or { } { } {
2, 5 } { } {
6, 7 } { } { } { } { } { } or { } { } or { } {
4, 6 } { } To balance the moment value of each code word in Table I to be 0, no inverting operation isrequired for the first three code words. To balance code word 1100010, the first or the seventhbit is inverted. Two inverting operations are required for three code words (underlined). Since d min = 3 , in a general case, two distinct code words can be flipped into one identical sequenceby more than one inverting operation for either code word. Therefore, two bit flips choices willnot be considered in this case.However, it is observed that the balancing choice of a given code word is not unique. As shownin the rows where the code word are highlighted in bold, there are at least two options to balanceone code word. In this example, a single insertion/deletion error correcting code is achieved as shown in Table II by excluding the code words which require two bits to balance and thereforedrain the substitution error correcting capability of the original code, and including multiplebalanced code words derived from one original code word. In this case, the code rate is notcomprised by using Table II to encode. Including multiple balanced code words generated fromthe same original code word, however, heavily affects the substitution error correcting capabilityof the resulting code. The intention of showing Table II is to demonstrate a bit flipping approachto implement the VT construction. TABLE IIB IT - INVERTING MOMENT BALANCING TEMPLATE OF A (7, 16, 3) H
AMMING CODE WITH m = 8 Code word σ ( c ) S {} {} {} { } { } { } { } { } { } { } { } { } { } { } { } { } To this end, we can derive the following code based on the bit flipping scheme: , , , , , , , , , , , , . (2)There is a small observation leading to the following lemma. Lemma 3:
Let σ (cid:48) = a − σ ( c ) (mod m ) . If σ (cid:48) = i and m = 2 i , where i ∈ { , , . . . , n } , inorder to balance the sequence c to have σ ( x ) = a , only 1 bit flip at the i ’th index is required. Proof:
When inverting the i ’th bit of c from 0 to 1, σ (cid:48) = i . When inverting the i ’th bit of c from 1 to 0, σ (cid:48) = m − i . Let i = m − i , we obtain m = 2 i . Therefore, one bit flip at the i ’thindex is sufficient to obtain x from c to have σ ( x ) = a .The special balancing case presented by Lemma 3 is not rare and the numerical examples canbe found in Table I for the original code words 0100111, 0110001 and 1111111 to be balanced.Note that in the earlier example, we conceptually choose a short Hamming code with d min = 3 .In practical systems, we may choose long BCH codes with larger d min to retain most of thesubstitution error correcting capability and add an insertion/deletion error correcting capabilityto the sequences.The validation and efficiency (in terms of the number of bit-flips introduced) of variable indexscheme depend on ( n , a , m ) and the original code. For example, it is impossible to balance thesequence 010101 for a = 0 and m = 7 by less than three bit-flips. Therefore, in the codeconstruction stage, a proper selection procedure is required, which involves selecting a and m , and/or expunging some code words to optimize the code rate and/or the error correctingcapability. In the next section, we will provide a guaranteed scheme by expunging some codewords. B. One-Flip Moment Balancing Scheme
The property presented by the following lemma is the key to the OFMB scheme.
Lemma 4:
Let c denote a sequence of length n . The n + 1 sequences including the originalsequence c and n different sequences each have a bit-flip from c , have at least (cid:100) n (cid:101) + 1 differentmoment values in a residue system defined by (1) with modulo m > n . Proof:
By flipping one bit of c in n different indices, n different sequences are generatedand each is different from c . A bit-flip introduces a difference in (1) since m > n . There are only two types of bit-flips. Either a bit with value 0 is substituted by 1, or value 1 substituted by0. The differences introduced to (1) by all possible 0-to-1 (or 1-to-0) flips are all different alsothanks to m > n . Since there are in total n possible bit-flips. At least half of them are either0-to-1 or 1-to-0 flips. Therefore, by one bit-flips at each index, at least (cid:100) n (cid:101) new moment valuesare introduced. Including the moment value of c , there are (cid:100) n (cid:101) + 1 different values introducedby c and the sequences with one bit-flip from c .We consider to moment-balance a code to a new code, in which each code word has identicalmoment value in the residue system modulo m = n + 1 . Given a binary ( n , M , d min ≥ ) code C , a new code C (cid:48) can be generated by using the OFMB scheme, actually an expunging processillustrated by the following steps:1) By flipping only one bit in each index of c ∈ C , n + 1 different sequences including theoriginal word are generated from c . These n + 1 sequences carry at least (cid:100) n (cid:101) + 1 differentmoment values according to Lemma 4. For each generated moment values, we only selectone sequence even if there are multiple sequences carry the same moment value.2) By applying Step 1 to all c ∈ C , at least M ( (cid:100) n (cid:101) + 1) different sequences are generatedsince the original code C has d min ≥ and for each c at least (cid:100) n (cid:101) + 1 new sequences aregenerated in the previous step.3) The different sequences generated in the previous step are partitioned into m = n + 1 setsaccording to (1). Among them, the one with the biggest cardinality is chosen as the newcode C (cid:48) . Theorem 2:
Let C be a ( n, M, d min ≥ code. An ( n, M (cid:48) , d (cid:48) min ≥ d min − code C (cid:48) existswhere M (cid:48) ≥ (cid:108) M ( (cid:100) n (cid:101) +1) n +1 (cid:109) ≥ (cid:108) M (cid:109) . All code words in C (cid:48) satisfy the residue system defined in (1)with m = n + 1 and some a , and therefore C (cid:48) is also a single insertion/deletion error correctingcode. Proof:
As discussed in the last section, among the original code word and the new sequencesgenerated from it by one flip, if there are multiple sequences carrying the same moment valueonly one sequence should be chosen in order to minimize the decrease of minimum Hammingdistance of the resulting code. Therefore, in the steps shown earlier in this section, although M ( n + 1) different sequences can be generated based on C and one-flip operations, only no lessthan M ( (cid:100) n (cid:101) + 1) sequences are chosen. Since the sequences are partitioned into m = n + 1 sets, the cardinality M (cid:48) of the resulting code C (cid:48) satisfies M (cid:48) ≥ (cid:108) M ( (cid:100) n (cid:101) + 1) n + 1 (cid:109) ≥ (cid:108) M (cid:109) . (3)Hence, C (cid:48) is a single insertion/deletion error correcting code. Moreover, in C (cid:48) no two code wordsare generated through one-flip operations from the same original code word. Therefore, theminimum Hamming distance d (cid:48) min of C (cid:48) satisfies d (cid:48) min ≥ d min − . (4)The brute-force method as shown in the steps presented earlier in this section, which is similarto the method to implement the MFMB scheme, can be used to choose a code and an encodingtable. IV. F IXED I NDEX B IT F LIPPING M OMENT B ALANCING S CHEME
We further present a fixed index bit flipping scheme in this section. According to Lemma 2,it is guaranteed that by flipping some bits in the fixed indices i ∈ { , , , . . . , (cid:98) log n (cid:99) } , themoment value of the obtained sequence can be balanced. Since the indices of possible inversionsare fixed, the bits in these indices can be considered as erasures at the decoder and the numberof erasures is (cid:98) log n (cid:99) + 1 .An example of a fixed index bit flipping scheme follows. Example 2:
Choose C as a (15, 32, 7) binary primitive BCH code, a = 0 and m = n + 1 = 16 .In Table III, two obtained codes by implementing the variable index moment balancing schemeand fixed index moment balancing scheme are presented in the 3rd column and 4th columnrespectively. The inversions of code words in Code I are highlighted in bold. The fixed indicesof Code II are the 1st, 2nd, 4th and 8th indices. Only bits in these indices are possibly inverted.It is evident that Code I obtained by implementing the variable index scheme, compromises thesubstitution error correcting capability to achieve single insertion/deletion error correction. Theresulting code can also correct one substitution error. The cardinality of Code I can be furtherincreased by including more bit-flip options. The trade-off is the substitution error correctingcapability of the resulting code will be further compromised. Code II obtained by implementingfixed index scheme has the same cardinality as the original code. To decode Code II, we can consider the bits at the fixed indices are erasures. In this sense, Code II can correct a singleinsertion/deletion error or one substitution error in addition to four erasures.The following theorem illustrates the fixed index bit flipping moment balancing scheme. Theorem 3:
Let C be a ( n , M , d min ) substitution error correcting code. Let a , d and m bethree integers, where d = (cid:98) log n (cid:99) + 1 , ≤ a < m and (cid:98) log n (cid:99) +1 ≥ m > n . Any code word c in C can be turned into a sequence x with σ ( x ) = a that can correct a single insertion/deletionerror or (cid:98) d min − d − (cid:99) substitution errors by flipping maximum (cid:98) log n (cid:99) + 1 bits in the fixed indices { , , , . . . , (cid:98) log n (cid:99) } of c . Proof:
According to Lemma 2, by changing the values of the bits in the fixed indices, themoment value of these bits can take on any value between 0 to (cid:98) log n (cid:99) +1 − . Therefore, it issufficient to turn c into x , which has σ ( x ) = a . Since these bits used to balance the momentvalue are in the fixed indices, they can be considered as erasures by the decoder. Hence, thecode can correct (cid:98) d min − d − (cid:99) substitution errors.The decoding process for both fixed and variable index schemes can be described as follows.At the receiver, based on marker or special synchronization words inserted between frames,insertion or deletion errors are first detected. If a single insertion or deletion is detected, itcan be decoded by using the algorithm presented in [5]. If there is no insertion or deletionerror, the decoder proceeds with the procedure of substitution error correction. While correctingsubstitution errors in the sequences encoded by the fixed scheme, the bits in the fixed indicesshould be marked as erasures first. V. A NALYSIS
Let C ( n, d min , s ) denote a code of length n which has d min minimum Hamming distanceand also can correct s insertion/deletion errors. In [5], Levenshtein introduces a class of codeswhich can correct single insertion/deletion or single substitution error C ( n, , . Equivalently,it gives a lower bound of cardinalities of C ( n, , , which is n n . In this work, we extend thecode to C ( n, d min , and the lower bound can be further considered in the light of the schemespresented in this paper.For the class of codes constructed in [5], the construction starts by using VT construction andthe single substitution error correcting capability is a by-product. In the work, we start with amultiple substitution error correcting code and make it correct a single insertion/deletion error with a limited compromised substitution error correcting capability compared with the originalcode. The construction takes two steps and is deterministic.As well known, binary MDS codes are trivial [35]. While n is small, very often the per-formance of a linear code deteriorates drastically if its valuable substitution error correctingcapability is compromised for a single insertion/deletion error correction. Therefore, if n issmall, non-linear codes can be considered since the new schemes are not limited to implementinglinear codes. Since at most (cid:98) log n (cid:99) + 1 bits are required to turn a substitution error correctingcode into a single insertion/deletion error correcting code, while n is large the performance ofa substitution error correcting code does not degrade as much as short codes. In this case, thelinear codes are preferable considering the encoding and decoding complexities.Here we present a new lower bound of cardinalities for a C ( n, d min , code. Theorem 4:
There always exists a code C ( n, d min , ∗ with the cardinality |C ( n, d min , ∗ | ≥ n − V ( n,d min +1) , where V ( n, d min + 1) = (cid:80) d min +1 i =0 (cid:0) ni (cid:1) . Proof:
The Gilbert-Varsharmov (GV) bound [36], [37] ensures the existence of binary code C of length n with the minimum Hamming distance d min , having |C| ≥ n V ( n,d min − . Based onthis result, we first start with a binary code achieving the GV bound with the minimum Hammingdistance d min + 2 . This binary code has the cardinality no less than n V ( n,d min +1) . By using theOFMB scheme, the resulting code C ∗ has a reduced minimum Hamming distance d min and areduced cardinality no less than approximately half of the original code.To this end, we can further develop a tighter lower cardinality bound for C ( n, d min , codesas follows. Theorem 5:
There always exists a code C ( n, d min , ∗ with the cardinality | C ( n, d min , ∗ | ≥ max { n − V ( n, d min + 1) , n −(cid:98) log n (cid:99)− V ( n − (cid:98) log n (cid:99) − , d min − } . (5) Proof:
Apply the moment balancing template (MBT) to a binary code of length n −(cid:98) log n (cid:99) − , achieving the GV bound with the minimum Hamming distance d min . The resultingcode of length n has the minimum Hamming distance no less than d min and the cardinalityno less than n −(cid:98) log2 n (cid:99)− V ( n −(cid:98) log n (cid:99)− ,d min − . Therefore, we can combine this result with the lower boundderived in Theorem 4 and give a tighter lower bound.Levenshtein [5] found a class of codes which can correct a single insertion/deletion error ora substitution error ( d min ≥ ) based on the VT construction while m ≥ n . Therefore, the resulting code has a cardinality no less than n n . To this end, we compare the new lower boundin Theorem 4 with Levenshtein’s result for d min = 3 . Since the denominator in the new lowerbound is V ( n, d min + 1) = (cid:0) n (cid:1) + (cid:0) n (cid:1) + (cid:0) n (cid:1) + (cid:0) n (cid:1) + 1 when d min = 3 , the new lower bound is notsuperior to Levenshtein’s result. Note that Levenshtein’s result considers arbitrary sequences and d min = 3 only. The new result considers a substitution error correcting code and any minimumHamming distance. The new result not only increases the minimum Hamming distance rangeof the resulting codes, but also introduces a possible complexity reduction to the encoding anddecoding process.In Table IV, we compare the code word lengths, information lengths, insertion/deletion andsubstitution error correcting capabilities of the OFMB scheme and the MBT scheme applied toa substitution error correcting code.We further compare the OFMB scheme with an alternative scheme, namely the MBT scheme[1], which starts with a multiple substitution error correcting code and encodes each code wordwith a systematic VT construction. To implement the MBT scheme for a substitution errorcorrecting code achieving the GV bound with d min of length n −(cid:98) log n (cid:99)− , we insert (cid:98) log n (cid:99) +1 balancing bits to hold the fixed indices { , , , . . . , (cid:98) log n (cid:99) } in each code word and ensure(1) to be met by judicially choosing the values of the balancing bits. In Fig. 1 we compare thecardinalities of codes generated by the MBT template and the OFMB scheme respectively. Bothresulting codes have length 265 and the original code for the MBT template has length 256.Note that both original codes are codes that achieve the GV bound. As shown in Fig. 1, whilethe minimum Hamming distance of the resulting code takes the value between 20 to 110 thecardinality of the code generated based on the OFMB scheme is superior to the one generatedby the MBT scheme.By the following theorem we present a comparison in an asymptotic form between the lowerbounds derived based on the OFMB and the MBT schemes respectively. Let H ( · ) denote thebinary entropy function. Theorem 6 (Asymptotic bound):
There always exists a code C ( n, d min , ∗ with the cardinality | C ( n, d min , ∗ | ≥ n − − H ( dmin +1 n ) n , (6)while n is large and ( d min +1)( (cid:98) log n (cid:99) +1)2 n > . Proof:
Based on Theorem 5, the lower bound of cardinalities is the maximum value betweenthe lower bound derived based on the OFMB shceme, which is n − V ( n,d min +1) and the one based Fig. 1. Comparison of lower bounds of cardinalities between MBT template and OFMB scheme when the original code for theMBT template has length . on the MBT scheme, which is n −(cid:98) log2 n (cid:99)− V ( n −(cid:98) log n (cid:99)− ,d min − . Let δ = d min +1 n and δ = d min − n −(cid:98) log n (cid:99)− .Since in an asymptotic form ( H ( δ )+ o (1)) n ≤ V ( n, δn ) ≤ H ( δ ) n , (7)where ≤ δ ≤ , we can give an estimate of V ( n, δn ) = 2 H ( δ ) n . Therefore, we have n − V ( n, d min + 1) = 2 n − − H ( δ ) n (8)and n −(cid:98) log n (cid:99)− V ( n − (cid:98) log n (cid:99) − , d min −
1) = 2 n −(cid:98) log n (cid:99)− − H ( δ )( n −(cid:98) log n (cid:99)− . (9) Let ∆ denote the difference between the exponents of the right terms in (8) and (9). We have ∆ = n − − H ( δ ) n exponent of the right term in (8) − n − (cid:98) log n (cid:99) − − H ( δ )( n − (cid:98) log n (cid:99) − exponent of the right term in (9) = n ( H ( δ ) − H ( δ )) first term + (cid:98) log n (cid:99) (1 − H ( δ )) second term − H ( δ ) third term . (10)We have the following observations: • While n is large, the third term can be ignored. • Since ≤ H ( δ ) ≤ the second term is no less than zero. • Since ≤ δ , δ ≤ , if δ > δ the first term is positive.Based on the definitions of δ and δ , if δ > δ , which means d min − n − (cid:98) log n (cid:99) − > d min + 1 n , (11)the condition ( d min + 1)( (cid:98) log n (cid:99) + 1)2 n > (12)is required.It is evident that the new lower bound based on the OFMB scheme is guaranteed to be superiorto the one derived based on the original MBT scheme if (12) is met, and there are a wide rangeof 2-tuple ( d min , n ) satisfying (12).VI. C ONCLUSION AND F UTURE W ORK
The two bit flipping schemes have three major advantages compared to the original template[1]. First, insertion/deletion errors and substitution errors are channel errors. Both should beconsidered while designing a code for a harsh channel, and the preferred original code is likely tobe substitution error correcting code already. Second, we start with the most widely used channelcodes, substitution error correcting codes. The original error correcting capability, the remainingsubstitution error correcting capability, and single insertion/deletion correcting capability can bebalanced by using the new schemes. Since the new schemes only invert the bits if necessary tosatisfy moment constraint and every flip in general will cause more deduction on the substitutionerror correcting capability, therefore we can say the capability can be reduced if required. Third,most modern systems already are designed based on a given substitution error correcting code.The new schemes are practical to implement on top of an existing system since they require nocode length change. In this paper, we present an approach to reduce the capability of a substitution error correctingcode to also correct a single insertion/deletion error - in fact thus a rate R = 1 moment balancingtemplate. The encoding and decoding procedures are discussed and the performance is evaluated.The efficiency of MFMB scheme can be further evaluated in future work. An analyticalapproach by using generation functions involving polynomial multiplication was considered bythe authors to investigate how many different moment values can be generated by multiple bit-flips from a given code word. However, as shown in [38], [39], the analytical model of numberof terms of power of polynomial is still an open question to the authors’ best knowledge.R EFERENCES [1] H. C. Ferreira, K. A. S. Abdel-Ghaffar, L. Cheng, T. G. Swart, and K. Ouahada, “Moment balancing templates:Constructions to add insertion/deletion correction capability to error correcting or constrained codes.”
IEEE Trans. Inform.Theory , vol. 55, no. 8, pp. 3494–3500, Aug. 2009.[2] L. Cheng, H. C. Ferreira, and K. Ouahada, “Moment balancing templates for spectral null codes .”
IEEE Trans. Inform.Theory , vol. 56, no. 8, pp. 3749–2753, Aug. 2010.[3] L. Cheng, H. C. Ferreira, and I. Broere, “Moment balancing templates for (d,k)-constrained codes and run-length limitedsequences.”
IEEE Trans. Inform. Theory , vol. 58, no. 4, pp. 2244–2252, Apr. 2012.[4] R. P. Varshamov and G. M. Tenengolts, “Correction code for single asymmetric errors.”
Automat. Telemekh. , vol. 26, pp.288–292, 1965.[5] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and substitutions of symbols.”
Dokl. Akad.Nauk SSSR , vol. 163, no. 4, pp. 845–848, 1965.[6] G. M. Tenengolts, “Class of codes correcting bit loss and errors in the preceding bit.”
Avtomatika i Telemakhanika , no. 5,pp. 174–179, 1976.[7] S. D. Constantin and T. R. N. Rao, “On the theory of binary asymmetric error correcting codes.”
Information and Control ,vol. 40, pp. 20–36, 1979.[8] G. M. Tenengolts, “Nonbinary codes, correcting single deletion or insertion.”
IEEE Trans. Inform. Theory , vol. 30, no. 5,pp. 766–769, Sep. 1984.[9] H. C. Ferreira, W. A. Clarke, A. S. J. Helberg, K. A. S. Abdel-Ghaffar, and A. J. H. Vinck, “Insertion/deletion correctionwith spectral nulls.”
IEEE Trans. Inform. Theory , vol. 43, no. 2, pp. 722–732, Mar. 1997.[10] K. A. S. Abdel-Ghaffar, “Detecting substitutions and transpositions of characters,”
The Computer Journal , vol. 41, no. 4,pp. 270–277, 1998.[11] K. A. S. Abdel-Ghaffar, F. Paluncic, H. C. Ferreira, and W. A. Clarke, “On Helberg’s generalization of the Levenshteincode for multiple deletion/insertion error correction,”
IEEE Trans. Inform. Theory , vol. 58, pp. 1804–1808, 2012.[12] M. C. Davey and D. J. C. MacKay, “Reliable communication over channels with insertions, deletions and substitutions.”
IEEE Trans. Inform. Theory , vol. 47, no. 2, pp. 687–698, Feb. 2001.[13] K. Immink and G. Beenker, “Binary transmission codes with higher order spectral zeros at zero frequency (corresp.),”
IEEE Trans. Inform. Theory , vol. 33, no. 3, pp. 452–454, 1987.[14] A. S. J. Helberg and H. C. Ferreira, “On multiple insertion/deletion correcting codes.”
IEEE Trans. Inform. Theory , vol. 48,no. 1, pp. 305–308, Jan. 2002. [15] L. Dolecek and V. Anantharam, “A synchronization technique for array-based LDPC codes in channels with varyingsampling rate.” in IEEE International Symposium on Information Theory (ISIT) , Seattle, USA, Jul. 2006, pp. 2057 – 2061.[16] ——, “On subsets of binary strings immune to multiple repetition errors.” in
IEEE International Symposium on InformationTheory (ISIT) , Nice, France, Jun. 2007, pp. 1691–1695.[17] P. A. H. Bours, “Codes for correcting insertion and deletion errors,” Ph.D. dissertation, Technische Universiteit Eindhoven,1994.[18] L. Cheng, “Coding techniques for insertion/deletion error correction,” Ph.D. dissertation, University of Johannesburg, 2011.[19] L. Cheng, T. G. Swart, H. C. Ferreira, and K. A. Abdel-Ghaffar, “Codes for correcting three or more adjacent deletions orinsertions,” in
IEEE International Symposium on Information Theory (ISIT) , Honolulu, USA, Jul. 2014, pp. 1246–1250.[20] H. C. Ferreira, L. Cheng, T. G. Swart, and K. A. S. Abdel-Ghaffar, “Interleaving arrays for insertion/deletion or reversalerror correction,” presented at the Information Theory and Applications Workshop, San Diego, USA, Feb. 2015.[21] C. Schoeny, A. Wachter-Zeh, R. Gabrys, and E. Yaakobi, “Codes correcting a burst of deletions or insertions,” in
IEEEInternational Symposium on Information Theory (ISIT) , Barcelona, Spain, Jul. 2016, pp. 630–634.[22] D. Smith, T. G. Swart, K. A. S. Abdel-Ghaffar, H. C. Ferreira, and L. Cheng, “Interleaved constrained codes with markerscorrecting bursts of insertions or deletions,”
IEEE Communications Letters, accepted for publication , 2017.[23] J. Brakensiek, V. Guruswami, and S. Zbarsky, “Efficient low-redundancy codes for correcting multiple deletions,” in
Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms . Society for Industrial andApplied Mathematics, 2016, pp. 1884–1892.[24] L. J. Schulman and D. Zuckerman, “Asymptotically good codes correcting insertions, deletions, and transpositions.”
IEEETrans. Inform. Theory , vol. 45, no. 7, pp. 2552–2557, Nov. 1999.[25] V. Guruswami and R. Li, “Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes,” in
IEEEInternational Symposium on Information Theory (ISIT) . IEEE, 2016, pp. 620–624.[26] V. Guruswami and C. Wang, “Deletion codes in the high-noise and high-rate regimes,”
IEEE Transactions on InformationTheory , vol. 63, no. 4, pp. 1961–1970, 2017.[27] B. Haeupler and A. Shahrasbi, “Synchronization strings: codes for insertions and deletions approaching the singletonbound,” arXiv preprint arXiv:1704.00807 , 2017.[28] ——, “Synchronization strings: Explicit constructions, local decoding, and applications,” arXiv preprint arXiv:1710.09795 ,2017.[29] B. Haeupler, A. Shahrasbi, and E. Vitercik, “Synchronization strings: Channel simulations and interactive coding forinsertions and deletions,” arXiv preprint arXiv:1707.04233 , 2017.[30] K. Cheng, X. Li, and K. Wu, “Synchronization strings: Efficient and fast deterministic constructions over small alphabets,” arXiv preprint arXiv:1710.07356 , 2017.[31] V. Guruswami and R. Li, “Efficiently decodable codes for the binary deletion channel,” arXiv preprint arXiv:1705.01963 ,2017.[32] ——, “Coding against deletions in oblivious and online models,” in
Proceedings of the Twenty-Ninth Annual ACM-SIAMSymposium on Discrete Algorithms . SIAM, 2018, pp. 625–643.[33] B. Haeupler, A. Shahrasbi, and M. Sudan, “Synchronization strings: List decoding for insertions and deletions,” arXivpreprint arXiv:1802.08663 , 2018.[34] K. A. S. Immink,
Codes for mass data storage systems . Shannon Foundation Publisher, 2004.[35] F. J. Macwiliams and N. J. A. Sloane,
The theory of error correcting codes (North-Holland Mathematical Library) .Amsterdam: North-Holland Publishing Co., 1977.[36] E. N. Gilbert, “A comparison of signalling alphabets,”
Bell System Technical Journal , vol. 31, no. 3, pp. 504–522, 1952. [37] R. Varshamov, “Estimate of the number of signals in error correcting codes,” in Dokl. Akad. Nauk SSSR , vol. 117, no. 5,1957, pp. 739–741.[38] P. Erd¨os and R. L. Graham,
Old and new problems and results in combinatorial number theory . L’Enseigenemetmath´ematique, 1980, vol. 28.[39] A. Schinzel and U. Zannier, “On the number of terms of a power of a polynomial,”
Rendiconti Lincei-Matematica eApplicazioni , vol. 20, no. 1, pp. 95–98, 2009. TABLE IIIF
IXED INDEX BIT FLIPPING MOMENT BALANCING SCHEME OF AN (15, 32, 7) BCH
CODE WITH m = 16 Code word σ ( c ) Variable Indices Bit Flipping Code I Fixed Indices Bit Flipping Code II000000000000000 0 000000000000000 000000000000000100001010011011 3 100001010011
001 111 TABLE IVC
OMPARISON BETWEEN
OFMB
SCHEME AND MOMENT BALANCING TEMPLATE [1]
FOR ( n , k , d min ) SUBSTITUTION ERRORCORRECTING CODES .Moment Balancing Template [1] One-Flip Moment Balancing SchemeCode word length n + (cid:98) log n (cid:99) + 1 n Information length k k − s insertion/deletion correction 1 1Minimum Hamming distance d min d min −−