Efficient Design of Subblock Energy-Constrained Codes and Sliding Window-Constrained Codes
11 Efficient Design of Subblock Energy-ConstrainedCodes and Sliding Window-Constrained Codes
Tuan Thanh Nguyen, Kui Cai, and Kees A. Schouhamer Immink
Abstract
The subblock energy-constrained codes (SECCs) and sliding window-constrained codes (SWCCs) have recently attracted attentiondue to various applications in communication systems such as simultaneous energy and information transfer. In a SECC, each codewordis divided into smaller non-overlapping windows, called subblocks, and every subblock is constrained to carry sufficient energy. In aSWCC, however, the energy constraint is enforced over every window. In this work, we focus on the binary channel, where sufficientenergy is achieved theoretically by using relatively high weight codes, and study SECCs and SWCCs under more general constraints,namely bounded SECCs and bounded SWCCs. We propose two methods to construct such codes with low redundancy and linear-timecomplexity, based on Knuth’s balancing technique and sequence replacement technique. These methods can be further extended toconstruct SECCs and SWCCs. For certain codes parameters, our methods incur only one redundant bit. We also impose the minimumdistance constraint for error correction capability of the designed codes, which helps to reduce the error propagation during decodingas well.
I. I
NTRODUCTION
Constrained coding has been used widely in various communication and storage systems. For example, to avoid detection errorsdue to inter-symbol interference and synchronization errors in magnetic and optical storage, runlength-limited codes (RLLs) areemployed to restrict any run of zeros between consecutive ones [1], [2]. Recently, the subblock energy-constrained codes (SECCs)and sliding window-constrained codes (SWCCs) have been shown as suitable candidates for providing simultaneous energy andinformation transfer from a powered transmitter to an energy-harvesting receiver [3]–[12]. In this scenario, the receiver uses thesame received signal both for decoding information and for harvesting energy which is to power the receiver’s circuitry. In 2008,Varshney [3] characterized the tradeoff between reliable communication and delivery of energy at the receiver by using a generalcapacity-power function, where transmitted sequences were constrained to contain sufficient energy. In this work, we focus on thebinary channel, where on-off keying is employed, and bit 1 (bit 0) denotes the presence (absence) of a high energy signal. As such,sufficient energy is achieved theoretically by using relatively high weight codes.Recently, Tandon et al. [4] demonstrated that imposing only an energy constraint over the whole transmitted sequence might notbe sufficient. It is important to avoid sequences which carry limited energy over long duration, and consequently, preventing energyoutage at a receiver having finite energy storage capability. In order to regularize the energy content in the signal, two classesof energy-constrained codes, namely SECCs and SWCCs, were suggested [4], [5]. Formally, in a binary SECC, each codewordis divided into smaller non-overlapping window, called subblocks , and every subblock is constrained to have sufficient number ofones. In contrast, a binary SWCC restricts the number of ones over every window of consecutive symbols (see Figure 1). Thisapproach has been investigated in [11]–[13]. SWCCs have been further studied for other applications of error-correction codes in[14], [15]. In fact, the subblock energy constraint is weaker than the latter, and even if every subblock in a codeword c carriessufficient energy, there might still be a subsequence in c that carries limited energy over long duration (see Example 1). In contrast,the sliding-window constraint enables all codewords to carry sufficient energy over every duration, which meets real-time deliveryrequirements, but also reduces the number of valid codewords and therefore the information capacity. In this work, we providesome bounds for SWCCs and show that if the length of each duration satisfies certain constraints, there exist codes whose ratesapproach capacity. In such cases, we design an efficient method to construct SWCCs with only one redundant bit.Furthermore, we study SECCs and SWCCs under more general constraints, namely bounded SECCs and bounded SWCCs . Theadditional constraint restricts the energy in every subblock in SECCs (or every window in SWCCs) to be below a given threshold,consequently preventing energy outage at a receiver having finite energy storage capability (see Figure 1). Throughout this paper, wepropose two methods for constructing bounded SECCs and bounded SWCCs, based on Knuth’s balancing technique and sequencereplacement technique . The methods can be extended to construct SECCs and SWCCs as well. We further combine these codesefficiently with error correction codes (ECCs), which also helps to reduce error propagation of the designed codes during decoding.Before we present the main results of the paper, we go through certain notations and then highlight the major contributions of thiswork.
Tuan Thanh Nguyen and Kui Cai are with the Singapore University of Technology and Design, Singapore 487372 (email: { tuanthanh nguyen,cai kui } @sutd.edu.sg).Kees A. Schouhamer Immink is with the Turing Machines Inc, Willemskade 15d, 3016 DK Rotterdam, The Netherlands (email: [email protected]).This paper was presented in part at the 2020 Proceedings of the IEEE International Symposium on Information Theory [29]. a r X i v : . [ c s . I T ] S e p . Notations Given two binary sequences x = x . . . x m and y = y . . . y n , the concatenation of the two sequences is defined by xy (cid:44) x . . . x m y . . . y n . For a binary sequence x , we use wt( x ) to denote the weight of x , i.e the number of ones in x . We use x to denote the complementof x . For example, if x = 00111 then wt( x ) = 3 and x = 11000 .Throughout this work, we denote the codeword length by n , the subblock (or window) length by (cid:96) where (cid:96) (cid:54) n . In SECCs, wealso require n = m(cid:96) for some positive integer m . Definition 1.
For (cid:54) a (cid:54) (cid:96) , we use S ( n, (cid:96), a ) to denote the set of all codewords with length n = m(cid:96) , and the weight in eachsubblock is at least a , and we use W ( n, (cid:96), a ) to denote the set of all codewords with length n (not necessary a multiple of (cid:96) ), andthe weight in every window of size (cid:96) is at least a . Definition 2.
For (cid:54) a < b (cid:54) (cid:96) , we use S ( n, (cid:96), [ a, b ]) to denote the set of all codewords with length n = m(cid:96) , and the weight ineach subblock is at least a and at most b . Similarly, W ( n, (cid:96), [ a, b ]) denotes the set of all codewords with length n and the weightin every window of size (cid:96) is at least a and at most b . Proposition 1.
For all (cid:54) a < b (cid:54) (cid:96) , we have(i) W ( n, (cid:96), a ) ⊂ S ( n, (cid:96), a ) , W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) ,(ii) W ( n, (cid:96), a ) ≡ W ( n, (cid:96), [ a, (cid:96) ]) , S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, (cid:96) ]) . Given < (cid:96) (cid:54) n , (cid:54) a < b (cid:54) (cid:96), the capacity of those constraint channels are defined by: c S ( (cid:96), a ) (cid:44) lim n →∞ /n log | S ( n, (cid:96), a ) | , c S ( (cid:96), [ a, b ]) (cid:44) lim n →∞ /n log | S ( n, (cid:96), [ a, b ]) | , c W ( (cid:96), a ) (cid:44) lim n →∞ /n log | W ( n, (cid:96), a ) | , c W ( (cid:96), [ a, b ]) (cid:44) lim n →∞ /n log | W ( n, (cid:96), [ a, b ]) | . The capacity c W ( (cid:96), a ) is studied and determined for certain values of (cid:96) and a in our companion paper [13]. A special class ofbounded SWCCs, namely locally balanced constraints , was introduced in [16] and the capacity c W ( (cid:96), [ a, b ]) was also studied when a = (cid:96)/ − (cid:15), b = (cid:96)/ (cid:15) for (cid:15) > . In general, to achieve high information capacity, the sufficient values for a, b are a (cid:54) p (cid:96) and b (cid:62) p (cid:96) for some constants (cid:54) p < / < p (cid:54) . In this work, not only are we interested in constructing large codes, we desireefficient encoders that map arbitrary binary messages into these codes. Definition 3.
For (cid:54) a (cid:54) (cid:96) (cid:54) n , and (cid:54) r (cid:54) n , an encoder E NC : { , } n − r → { , } n is a ( n, (cid:96), a ) - subblock energy-constrained encoder with r bits of redundancy if E NC ( x ) ∈ S ( n, (cid:96), a ) for all x ∈ { , } n − r . The rate of the encoder is computed by ( n − r ) /n = 1 − r/n . For (cid:54) a < (cid:96)/ < b (cid:54) (cid:96) (cid:54) n , the ( n, (cid:96), [ a, b ]) - bounded subblock energy-constrained encoder , ( n, (cid:96), a ) - slidingwindow-constrained encoder , and ( n, (cid:96), [ a, b ]) - bounded sliding window-constrained encoder are defined similarly.For each constraint, our design objectives include low redundancy (equivalently, high information rate) and low complexity ofthe encoding/decoding algorithms. In Section II and Section III, for certain codes parameters, the rate of our encoders approachesthe channel capacity. Definition 4.
For n, (cid:96) > , n = m(cid:96) , a sequence x = x x . . . x n ∈ { , } n is divided into m subblocks of size (cid:96) where the i thsubblock is denoted by B ( i,(cid:96) ) ( x ) , and B ( i,(cid:96) ) ( x ) = x ( i − (cid:96) +1 . . . x i(cid:96) for (cid:54) i (cid:54) m . On the other hand, the i th window of size (cid:96) of x , denoted by w ( i,(cid:96) ) ( x ) , is defined by w ( i,(cid:96) ) ( x ) = x i . . . x i + (cid:96) − for (cid:54) i (cid:54) n − (cid:96) + 1 . Example 1.
Let n = 18 , (cid:96) = 6 , m = 3 , a = 2 , b = 5 . Consider a sequence x = 001111110000011001 . The subblocks of x aredefined as follows. x = 001111 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . We verify that the weight in each subblock is within [2 , , and hence, x ∈ S (18 , , [2 , . However, x / ∈ W (18 , , [2 , , sincethere are windows of size six, for example w (3 , and w (9 , , that violate the weight constraint, x = 00111111 (cid:124) (cid:123)(cid:122) (cid:125) w (3 , (cid:124) (cid:123)(cid:122) (cid:125) w (9 , . B. Our Contributions
In this work, we design efficient methods of mapping arbitrary users’ data to codewords in SECCs, SWCCs, bounded SECCs,and bounded SWCCs. Formally, for (cid:54) a < b (cid:54) (cid:96) ,ig. 1: SECCs, SWCCs, bounded SECCs, and bounded SWCCs.(i) In Section II, we propose an efficient encoder for bounded SECCs S ( n, (cid:96), [ a, b ]) using the Knuth’s balancing technique. Notethat S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, (cid:96) ]) , and hence, the method can be applied to construct efficient encoder for SECCs S ( n, (cid:96), a ) as well.Particularly, we extend the Knuth’s balancing technique for balanced codes, i.e. a = b = (cid:96)/ , to construct S ( n, (cid:96), [ a, b ]) for aspecial case when a = p (cid:96), b = p (cid:96), (cid:54) p < / < p (cid:54) , and then generalize this technique for arbitrary (cid:54) a < b (cid:54) (cid:96) .(ii) In Section III, we first study the size of SWCCs W ( n, (cid:96), a ) and bounded SWCCs W ( n, (cid:96), [ a, b ]) . When a = p (cid:96), b = p (cid:96), forsome (cid:54) p < / < p (cid:54) , we show that when the window size satisfies certain constraints, the code size is at least n − .We then propose efficient encoders for W ( n, (cid:96), a ) and SWCCs W ( n, (cid:96), [ a, b ]) by using the sequence replacement technique.For certain values of a, b, (cid:96) , our method incurs only one redundant bit.(iii) In Section IV, we study these codes with given error correction capability. Particularly, we construct codes that can correctmultiple errors with the assumption that the distance between any two errors is at least (cid:96) . The intuition behind this assumptionis that when the energy constraint is enforced over every window of size (cid:96) , the probability of having error is minimized overevery window. We consider the worst case scenario when there is at most one error over every window of size (cid:96) .II. SECC S AND B OUNDED
SECC S In this section, we propose simple coding scheme to construct S ( n, (cid:96), a ) and S ( n, (cid:96), [ a, b ]) . We are interested in the case wherethe number of subblocks is constant, i.e. m = Θ(1) , (cid:96) = Θ( n ) . Particularly, we first modify the Knuth’s balancing technique toconstruct S ( n, (cid:96), [ a, b ]) when there exist two constant numbers p , p , (cid:54) p < / < p (cid:54) such that a (cid:54) p (cid:96), b (cid:62) p (cid:96) . We thenextend this method to construct S ( n, (cid:96), [ a, b ]) and S ( n, (cid:96), a ) for arbitrary a, b . A. Maximum Information Rate
The following result is immediate.
Proposition 2.
For n = m(cid:96), (cid:54) a < b (cid:54) (cid:96) , we have | S ( n, (cid:96), [ a, b ]) | = (cid:16)(cid:80) bi = a (cid:0) (cid:96)i (cid:1)(cid:17) m and | S ( n, (cid:96), a ) | = (cid:16)(cid:80) (cid:96)i = a (cid:0) (cid:96)i (cid:1)(cid:17) m . In fact, we are able to show that the sizes of S ( n, (cid:96), a ) and S ( n, (cid:96), [ a, b ]) , under certain conditions of a, b, (cid:96) , are at least n − ,and therefore, the channel capacity in such cases is 1. Theorem 1.
For all (cid:54) p < / < p (cid:54) and a (cid:54) p (cid:96), b (cid:62) p (cid:96) , let c = min { / − p , p − / } . For n, (cid:96) that (1 /c ) log e n (cid:54) (cid:96) (cid:54) n , we have | S ( n, (cid:96), [ a, b ]) | ≥ n − . We defer the proof of Theorem 1 to Section III, Theorem 4. In fact, Theorem 4 presents a stronger result that under the assumptionof n, (cid:96), a, b, as mentioned in Theorem 1, we have | W ( n, (cid:96), [ a, b ]) | ≥ n − . Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) , Theorem 1 is thenproved. B. Efficient Construction of S ( n, (cid:96), [ a, b ]) In this section, we modify the Knuth’s balancing technique to construct S ( n, (cid:96), [ a, b ]) . Knuth’s balancing technique is a linear-timealgorithm that maps a binary message x to a balanced binary word y of the same length by flipping the first t bits of x [17]. Thecrucial observation demonstrated by Knuth is that such an index t always exists and t is commonly referred to as a balancingndex . To represent such a balancing index, Knuth appends y with a short balanced suffix p of length log n and hence, a lookuptable of size n is required. Modifications of the generic scheme are discussed in [18]–[21]. Definition 5.
For a binary sequence x ∈ { , } n and (cid:54) t (cid:54) n , let f t ( x ) denote the binary sequence obtained by flipping the first t bits of x . Example 2.
Let x = 001111 ∈ { , } . We have f ( x ) = 101111 , f ( x ) = 111111 , f ( x ) = 110111 , f ( x ) = 110011 , f ( x ) =110001 , and f ( x ) = 110000 . Hence, t = 5 is the unique balancing index of x . In general, the balancing index may not be unique.For example, consider y = 001100 . We observe that both f ( y ) = 101100 and f ( y ) = 110010 are balanced, therefore, both t = 1 and t = 5 are balancing indices of y .We now extend the Knuth’s method to construct S ( n, (cid:96), [ a, b ]) when a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant p , p , (cid:54) p < /
Let n be even and set [ n ] = { , , , . . . , n } . For arbitrary < k (cid:54) n , a walk of size k in [ n ] is the set of indices S ( k,n ) (cid:44) { , n } ∪ { ik : i (cid:62) and ik < n } . Theorem 2.
Given (cid:96) even and (cid:54) p < / < p (cid:54) . Let k = ( p − p ) (cid:96) . For an arbitrary binary sequence x ∈ { , } (cid:96) , thereexists an index t in the set S ( k,(cid:96) ) , such that the weight of f t ( x ) is within [ p (cid:96), p (cid:96) ] .Proof. In the trivial case, when the weight of x satisfies the constraint, i.e. wt( x ) ∈ [ p (cid:96), p (cid:96) ] , then we can select t = 0 ∈ S ( k,(cid:96) ) .Otherwise, assume that wt( x ) / ∈ [ p (cid:96), p (cid:96) ] , and without loss of generality, assume that wt( x ) < p (cid:96) (cid:54) (cid:96)/ . Since wt( x ) < (cid:96)/ ,we have wt( f (cid:96) ( x )) > (cid:96)/ . Now, for k = ( p − p ) (cid:96) , consider the list of indices, t = k, t = 2 k, and t i = ik ∈ S ( k,n ) . Since f t i ( x ) and f t i +1 ( x ) differ at most k positions, and wt( x ) < (cid:96)/ , wt( f (cid:96) ( x )) > (cid:96)/ , there must be an index t ∈ S ( k,(cid:96) ) such that p (cid:96) (cid:54) wt( f t ( x )) (cid:54) p (cid:96) . (cid:4) Example 3.
Let x = 110000000000 ∈ { , } , wt( x ) = 2 . Let p = 1 / and p = 2 / , i.e. we want a codeword that has weightin [4 , . We compute k = ( p − p ) (cid:96) = 4 . The set S ( k,(cid:96) ) = { , , , } . We can verify that f ( x ) = 001100000000 , f ( x ) = 001111110000 , f ( x ) = 001111111111 . Hence, for t = 8 ∈ S ( k,(cid:96) ) , we get wt( f t ( x )) ∈ [ p (cid:96), p (cid:96) ] . Lemma 1.
Given n > and (cid:54) p < / < p (cid:54) . Let x ∈ { , } n such that p n (cid:54) wt( x ) (cid:54) p n . For any binary balanceword y ∈ { , } m , we have p ( n + m ) (cid:54) wt( xy ) (cid:54) p ( n + m ) .Proof. We have wt( xy ) = wt( x ) + m/ . Since (cid:54) p < / < p (cid:54) , it implies that p m < m/ < p m .As given, p n (cid:54) wt( x ) (cid:54) p n , we then conclude that p ( n + m ) (cid:54) wt( xy ) (cid:54) p ( n + m ) . (cid:4) For constant (cid:54) p < / < p (cid:54) , k = ( p − p ) (cid:96) , the size of S ( k,(cid:96) ) is at most (cid:98) / ( p − p ) (cid:99) +1 , which is independent of (cid:96) . Let r = (cid:100) log ( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) . To encode an arbitrary binary sequence x to a codeword in S ( n, (cid:96), [ a, b ]) , where a (cid:54) p (cid:96), b (cid:62) p (cid:96) ,we divide x into subblocks of length N = (cid:96) − r . We then encode each subblock and concatenate the outputs. For each subblock, wesimply find the smallest index t in S ( k,(cid:96) − r ) such that y = f t ( x ) satisfies the weight constraint. According to Theorem 2, such indexalways exists. To represent such index, we also append a short balanced suffix, and so, a lookup table of size log | S ( k,(cid:96) − r ) | = r isrequired.For completeness, we describe the formal encoder/decoder of S ( n, (cid:96), [ a, b ]) as follows. Preparation phase.
Given n = m(cid:96) , (cid:54) p < / < p (cid:54) , set k = ( p − p ) (cid:96) and r = (cid:100) log( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) . Set S ( k,(cid:96) − r ) be the set of indices as defined in Definition 6. We construct a one-to-one correspondence between the indices in S ( k,(cid:96) − r ) and the r bits balanced sequences. Encoder S I NPUT : x ∈ { , } m ( (cid:96) − r ) O UTPUT : y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) − r ) ( x ) • Search for the first index t in S ( k,(cid:96) − r ) , such that wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] • Let p i be the r bits balanced sequence representing t • Set y i = f t ( z i ) p i (II) Finally, we output y = y y . . . y m heorem 3. The Encoder S is correct. In other words, E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) for all x ∈ { , } m ( (cid:96) − r ) .Proof. To show y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) , we need to verify that the weight of every subblock of y is in [ a, b ] . From EncoderS, the i th subblock is y i = f t ( z i ) p i . Since wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] and p i is a balanced word of length r , according toLemma 1, wt( y i ) ∈ [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] . (cid:4) Decoder S I NPUT : y ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) O UTPUT : x = D EC S ( y ) ∈ { , } m ( (cid:96) − r ) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) ) ( y ) • Let p i be the suffix of length r of z i that corresponds to an index t ∈ S ( k,(cid:96) − r ) ’ • Obtain z (cid:48) i by removing p i from z i • Set x i = f t ( z (cid:48) i ) (II) Finally, we output x = x x . . . x m Alternatively, for each subblock, the index can be encoded/decoded in linear-time without the look-up table for S ( k,(cid:96) − r ) . However,the redundancy increases from r to r and the set of indices is S ( k,(cid:96) − r ) . Recall that | S ( k,(cid:96) − r ) | = | S ( k,(cid:96) − r ) | = r . The modifiedEncoder S’ can be constructed as follows. We skip the detail of the corresponding Decoder S’. Encoder S’ .I NPUT : x ∈ { , } m ( (cid:96) − r ) O UTPUT : y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) − r ) ( x ) • Search for the first index t in S ( k,(cid:96) − r ) , such that wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] • Let
Γ = τ τ . . . τ r be the binary representation of the rank of the index t in S ( k,(cid:96) − r ) • Set p i = ΓΓ of length r , where Γ is the complement of Γ and set y i = f t ( z i ) p i (II) Finally, we output y = y y . . . y m Analysis.
The redundancy for encoding each subblock in Encoder S (or Encoder S’) is r = (cid:100) log ( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) (or r ),which is independent of (cid:96) . In other words, for constant p , p , r = Θ(1) . Consequently, the total redundancy for codewords oflength n = m(cid:96) is then mr = Θ( m ) . Therefore, this encoding method is efficient for large (cid:96) and the number of subblocks is small,compared to the length of codeword, i.e. m = Θ(1) , (cid:96) = Θ( n ) . In such cases, the rate of Encoder S is ( n − mr ) /n = 1 − mr/n → ,and similarly, the rate of Encoder S’ is − mr/n → , both approaching the channel capacity. Indeed, the same argument applieswhen m = o ( n ) . It is easy to verify that the complexity of Encoder/Decoder S (or S’) are linear in the codeword length. C. Extension to S ( n, (cid:96), a ) We can modify Encoder S to construct S ( n, (cid:96), a ) or S ( n, (cid:96), [ a, b ]) for arbitrary (cid:54) a < (cid:96)/ < b (cid:54) (cid:96) .For S ( n, (cid:96), [ a, b ]) , we let k = b − a and the set S ( k,(cid:96) ) is of size at most (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1 . The redundancy to encode each subblockof size (cid:96) is then (cid:100) log( (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1) (cid:101) . The efficiency of the encoder is high when a = o ( (cid:96) ) or b = Θ( (cid:96) ) . In such cases, since b − a = Θ( (cid:96) ) , we have (cid:100) log( (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1) (cid:101) = Θ(1) .Particularly, for SECCs S ( n, (cid:96), a ) when a < (cid:96)/ , Encoder S incurs only one redundant bit for each subblock. The simple ideais as follows. If the i th subblock has weight w < a < (cid:96)/ , the encoder simply flips the whole subblock (or equivalently take itscomplement), the weight of the complement is then w (cid:48) > (cid:96)/ > a . The encoder appends one bit p = 1 (or ) if the flipping actionis needed (or not needed). This classic code is known as the polarity bit code [2]. Example 4.
Let n = 21 , (cid:96) = 7 , m = 3 and a = 3 . Suppose the source data is x = 110000011001111100 ∈ { , } . The encoderchecks every subblock of length 6 and outputs c ∈ { , } where, c = 0011111 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . To decode c , the decoder also checks every subblock of length 7, if the last bit is 1, it flips the prefix. We then obtain the sourcedata x , x = 110000 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . II. SWCC
S AND B OUNDED
SWCC S In this section, we propose a simple coding scheme to construct W ( n, (cid:96), [ a, b ]) and W ( n, (cid:96), a ) by using the sequence replacementtechnique. Particularly, to construct W ( n, (cid:96), [ a, b ]) when a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant (cid:54) p < / < p (cid:54) , our methodincurs only one redundant bit. Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) for n = m(cid:96) , this method also provides an efficient encoderfor S ( n, (cid:96), [ a, b ]) with only one redundant bit. This yields a significant improvement in coding redundancy with respect to theKnuth’s balancing technique described in Section II. Note that the efficiency of Encoder S is high when the number of subblocksis constant, i.e. m = Θ(1) , (cid:96) = Θ( n ) , since the redundancy grows linearly with m . In this section, we show that there exists anefficient encoder when m is a function of n , as long as the size of subblocks (or windows) satisfies certain constraints. A. Maximum Information Rate
The following result implies that there exist such codes with size at least n − and hence, approaching the channel capacity. Theorem 4.
For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For a (cid:54) p (cid:96), b (cid:62) p (cid:96), and (1 /c ) log e n (cid:54) (cid:96) (cid:54) n , wehave | W ( n, (cid:96), [ a, b ]) | ≥ n − . To prove Theorem 4, we require
Hoeffding’s inequality [22].
Theorem 5 (Hoeffding’s Inequality) . Let Z , Z , . . . , Z n be independent bounded random variables such that a i (cid:54) Z i (cid:54) b i for all i . Let S n = (cid:80) ni =1 Z i . For any t > , we have P ( S n − E [ S n ] ≥ t ) ≤ e − t / (cid:80) ni =1 ( b i − a i ) . Proof of Theorem 4.
Let x be uniformly at random selected element from { , } n . A window w ( i,i + (cid:96) − of length (cid:96) of x is said tobe a forbidden window if the weight of it does not satisfy the constraint, i.e. wt( w ( i,i + (cid:96) − ) / ∈ [ a, b ] . We evaluate the probabilitythat the first window w (0 ,(cid:96) ) ( x ) is a forbidden window. Note that [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] . Applying Hoeffding’s inequality we obtain: P (cid:0) wt( w (0 ,(cid:96) ) ) / ∈ [ a, b ] (cid:1) (cid:54) P (cid:0) wt( w (0 ,(cid:96) ) ) / ∈ [ p (cid:96), p (cid:96) ] (cid:1) (cid:54) P (cid:0)(cid:12)(cid:12) wt( w (0 ,(cid:96) ) ) − (cid:96)/ (cid:12)(cid:12) ≥ c(cid:96) (cid:1) = 2 P (cid:0) wt( w (0 ,(cid:96) ) ) − (cid:96)/ ≥ c(cid:96) (cid:1) ≤ e − c (cid:96) (cid:96) = 2 e − c (cid:96) . The function f ( (cid:96) ) = 2 e − c (cid:96) is decreasing in (cid:96) . Since there are ( n − (cid:96) + 1) (cid:54) n windows, applying the union bound, we get P ( x / ∈ W ( n, (cid:96), [ a, b ])) ≤ n e − c (cid:96) (cid:54) ne − e n = 2 /n. Therefore, | W ( n, (cid:96), [ a, b ]) | (cid:62) n (1 − /n ) . For n ≥ , we have that − /n ≥ / . Therefore, W ( n, (cid:96), [ a, b ]) ≥ n − . Note that, since (cid:96) (cid:54) n , we also require n to be largeenough such that n (cid:62) (1 /c ) log e n . (cid:4) Before we present the efficient encoder/decoder for W ( n, (cid:96), [ a, b ]) , the following corollary is crucial to show the correctness ofour algorithms. When m = 1 , by replacing (cid:96) with ( (cid:96) − in Theorem 4, we obtain the following result. Corollary 1.
For (cid:96) (cid:62) , and (cid:96) − (cid:62) (1 /c ) log e ( (cid:96) − , we have (cid:12)(cid:12) W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − (cid:12)(cid:12) ≥ (cid:96) − .B. Sequence Replacement Technique We first present an efficient encoder for W ( n, (cid:96), [ a, b ]) when there exist constant numbers p , p , (cid:54) p < / < p (cid:54) that a (cid:54) p (cid:96), b (cid:62) p (cid:96) . For simplicity, we construct an efficient map that translates arbitrary messages into codewords in W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) ⊆ W ( n, (cid:96), [ a, b ]) . A similar class of SWCCs has been introduced in [14], [15]. Formally, such codes impose theweight constraint over every window of size at least (cid:96) , and here we refer such codes as strictly constrained SWCCs . Some lowerbounds on the size of codes are provided for specific value of p and p (for example, [14] considered p = 1 / and p = 5 / ).Our method is based on the sequence replacement technique . The sequence replacement technique has been widely used in theliterature [23]–[26]. It is an efficient method for removing forbidden windows from a source word. In general, the encoder removesthe forbidden windows and subsequently inserts its representation (which also includes the position of the windows) at predefinedpositions in the sequence. Crucial to the replacement step is to estimate the total number of forbidden windows.In the following of the section, for a binary sequence x , a window of size (cid:96) of x is said to be an (cid:96) -forbidden window if theweight of this window does not belong to [ p (cid:96), p (cid:96) ] . Let F ( (cid:96), [ p (cid:96), p (cid:96) ]) denote the set of all (cid:96) -forbidden windows of size (cid:96) . Thefollowing theorem provides an upper bound on the size of F ( (cid:96), [ p (cid:96), p (cid:96) ]) . Theorem 6.
For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For n (cid:62) and (cid:96) (cid:54) n such that (1 /c ) log e n (cid:54) (cid:96) ,let k = (cid:96) − − log n , there exists an one-to-one map Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k .roof. We first show that k > . Since c (cid:54) / , we have (cid:96) (cid:62) (1 /c ) log e n (cid:62) e n > .
77 log n > n for n (cid:62) . For anarbitrary x ∈ { , } (cid:96) , from the proof of Theorem 4, we have P ( x ∈ F ( (cid:96), [ p (cid:96), p (cid:96) ])) ≤ e − c (cid:96) (cid:54) /n . Therefore, the size of F ( (cid:96), [ p (cid:96), p (cid:96) ]) is at most | F ( (cid:96), [ p (cid:96), p (cid:96) ]) | (cid:54) (2 /n )2 (cid:96) = 2 (cid:96) +1 /n . Thus, to represent all forbidden windows in F ( (cid:96), [ p (cid:96), p (cid:96) ]) , we need all binary sequences of length at most k (cid:48) = log 2 (cid:96) +1 /n = (cid:96) + 1 − n (cid:54) (cid:96) − − log n = k for all n (cid:62) . Therefore, there exists a one-to-one map Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k . (cid:4) The key idea in the sequence replacement technique is to ensure that the replacement procedure is guaranteed to terminate. Thegeneral idea is to replace each forbidden window of length (cid:96) (if there is) with a subsequence of length shorter than (cid:96) . Consequently,after each replacement step, the length of codeword is reduced, the replacement procedure is guaranteed to terminate. In ourproblem, in the worst case, the final replacement step occurs when the length of the current word is (cid:96) + 1 , since after anotherreplacement (if needed), the length of the current word becomes at most (cid:96) and we cannot proceed further. This final step is crucialto ensure that the final output codeword satisfies the weight constraint. The following result provides an upper bound for the numberof sequences of length (cid:96) + 1 that include at least a forbidden window.Let G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) denote the set of all binary sequences of length ( (cid:96) + 1) that contain at least one forbidden window. Theorem 7.
For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For (cid:96) (cid:62) and (cid:96) (cid:62) (1 /c ) log e ( (cid:96) + 1) , we have | G ( (cid:96) +1 , [ p (cid:96), p (cid:96) ]) | (cid:54) (cid:96) − . In addition, there exists an one-to-one map Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − .Proof. Since there are only two windows, similar to the proof of Theorem 4 and Theorem 6, by using the union bound, and foran arbitrary x ∈ { , } (cid:96) +1 , we have P ( x ∈ G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ])) ≤ × × e − c (cid:96) (cid:54) / ( (cid:96) + 1) . Therefore, the size of G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) is at most | G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) | (cid:54) (4 / ( (cid:96) + 1) )2 (cid:96) +1 = 2 (cid:96) +3 / ( (cid:96) + 1) . For all (cid:96) (cid:62) , we have (cid:96) +3 / ( (cid:96) + 1) (cid:54) (cid:96) − . According to Corollary 1, | W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − | (cid:62) (cid:96) − , whichimplies that there exists an one-to-one map Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . (cid:4) C. Efficient Encoder/Decoder for W ( n, (cid:96), [ a, b ]) We now present a linear-time algorithm to encode W ( n, (cid:96), [ a, b ]) . For simplicity, we assume log n is an integer. Encoding algorithm.
The algorithm contains three phases: initial phase , replacement phase and extension phase . Particularly, thereplacement phase includes regular replacement and special replacement . Initial phase.
The source sequence x ∈ { , } n − is prepended with , to obtain y = 0 x ∈ { , } n . The encoder scans y and ifthere is no forbidden window, it outputs y . Otherwise, it proceeds to the replacement phase. Replacement phase.
The aim of this procedure is that, at the end of the replacement phase, all forbidden windows of size (cid:96) willbe removed and the length of the current word is at least (cid:96) . If the length of the current word is larger than (cid:96) + 1 , the encoderproceeds to the regular replacement. On the other hand, if the length of the current word y is ( (cid:96) + 1) , the encoder proceeds to thespecial replacement. • Regular replacement.
Let w ( i,(cid:96) ) be the first forbidden window in y , for some (cid:54) i (cid:54) n − (cid:96) +1 < n . According to Theorem 6,the total number of forbidden windows of size (cid:96) is at most k , where k = (cid:96) − − log n . Let q be the binary representation oflength log n of i , and q = Φ( w ( i,(cid:96) ) ) of length k . The encoder sets q regular = q q and removes this forbidden window w ( i,(cid:96) ) from y , and then prepends q regular to y . If, after this replacement, y contains no forbidden window, the encoder proceedsto the extension phase. Otherwise, the encoder repeats the replacement phase. Note that such an operation reduces the lengthof the sequence by one, since we remove (cid:96) bits and replace by n + k = 2 + log n + ( (cid:96) − − log n ) = (cid:96) − (bits).Therefore, this procedure is guaranteed to terminate. • Special replacement.
According to Theorem 7, the number of such words is at most (cid:96) − . The encoder sets q special = Ψ( y ) ∈ W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − , i.e. wt( q special ) ∈ [ p ( (cid:96) − , p ( (cid:96) − , and then replaces all ( (cid:96) + 1) bits with q special .After this replacement, the current word is of length (cid:96) and it does not contain any forbidden window. This is because theprefix is 10, which is balanced, the suffix p special satisfies wt( q special ) ∈ [ p ( (cid:96) − , p ( (cid:96) − , therefore, according to Lemma1, wt(10 q special ) ∈ [ p (cid:96), p (cid:96) ] . The encoder then proceeds to the extension phase. Extension phase.
If the length of the current sequence y is n where n < n , the encoder appends a suffix of length n = n − n to obtain a sequence of length n . Note that at the end of the replacement phase, the length of the current word is at least (cid:96) . Let z be the last window of size (cid:96) in y . Suppose that z = z z . . . z (cid:96) and wt( z ) ∈ [ p (cid:96), p (cid:96) ] . A simple way to create a suffix is to repeatappending z for sufficient times until the length exceeds n . Let j be the smallest integer such that c = yz j is of length greaterhan n . The encoder outputs the prefix of length n of c . We now show that c ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) . Since y does not contain anyforbidden window, it remains to show that there is no forbidden windows in the suffix z j . It is easy to see that repeating the vector z clearly satisfies the constraint since every window of size (cid:96) generated in this manner is a cyclic shift of the vector z , and since wt( z ) ∈ [ p (cid:96), p (cid:96) ] , there is no forbidden window.We now present an efficient algorithm to decode the source data uniquely. The decoding procedure is relatively simple as follows. Decoding algorithm.
The decoder scans from left to right. If the first bit is 0, the decoder simply removes 0 and identifies the last ( n − bits are source data. On the other hand, if it starts with 11, the decoder takes the prefix of length ( (cid:96) − and concludesthat this prefix is obtained by a regular replacement. In other words, the prefix is of the form q regular , q regular = q q where q is of length log n and q is of length k . The decoder removes this prefix, adds the forbidden window w = Φ − ( q ) into position i , which takes q as the binary representation. However, if it starts with 10, the decoder takes the prefix of length (cid:96) and concludesthat this prefix is obtained by a special replacement. In other words, the prefix of length (cid:96) can be represented by q special . Thedecoder replaces the prefix of length (cid:96) with the window of length (cid:96) + 1 , w = Ψ − ( q special ) , and then proceeds to decode from w .It terminates when the first bit is 0, and the decoder simply takes the following ( n − bits as the source data.We summary the details of our proposed encoder/decoder for W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) as follows. Preparation.
Given (cid:54) p < / < p (cid:54) , c = min { / − p , p − / } , n (cid:62) , (cid:96) (cid:62) , n (cid:62) (cid:96) , where (cid:96) − (cid:62) (1 /c ) log e (cid:96) . Let k = (cid:96) − − log n , we construct two one-to-one maps: Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k , and Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . In other words, every forbidden window of size (cid:96) in F ( (cid:96), [ p (cid:96), p (cid:96) ]) is represented by a k bits sequence, and every window ofsize (cid:96) + 1 in G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) is represented by a (cid:96) bits sequence in W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . Encoder W I NPUT : x ∈ { , } n − O UTPUT : c = E NC W ( x ) ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) (I) Initial Phase.
Set y ← x (II) Replacement Phase.
While (there is forbidden window in y ) and (the length of y is greater than (cid:96) + 1 ) Do : • Let i be smallest index such that w ( i,(cid:96) ) is forbidden, (cid:54) i (cid:54) n − (cid:96) + 1 • Let q be the binary representation of length log n of i and let q = Φ( w ( i,(cid:96) ) ) ∈ { , } k • Set q regular = q q • Set y ← y removes w ( i,(cid:96) ) • Set y ← q regular y If (the length of y is ( (cid:96) + 1) ) and (there is a forbidden window in y ) then : • Set q special = Ψ( y ) • Set y ← q special (III) Extension Phase . • Set z be the last window of size (cid:96) in y • Let j be the smallest integer where c = yz j is of length greater than n (IV) Output the prefix of length n of c Decoder W .I NPUT : c ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) O UTPUT : x = D EC W ( c ) ∈ { , } n − (I) While (the first bit is not 0) Do : • If (the first two bits are 11) then :(i) Let q q be the prefix of length (cid:96) − of c where q is of length log n and q is of length k (ii) c ← c remove the prefix(iii) Let i be the index whose binary representation is q (iv) Let w be the forbidden window of size (cid:96) in F ( (cid:96), [ p (cid:96), p (cid:96) ]) , w = Φ − ( q ) (v) Update c by adding w into c at index i • If (the first two bits are 10) then :(i) Let q special be the prefix of length (cid:96) of c (ii) Let w ∈ G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) such that w = Ψ − ( q special ) (iii) Set c ← w (II) If (the first bit is 0) then : Remove 0 • Set c be the prefix of length n − (III) Output c Complexity Analysis.
For codewords of length n , it is easy to verify that Encoder W and Decoder W have linear-time complexity.Particularly, in Encoder W, the initial phase takes Θ(1) time. The total number of replacement in the replacement phase is Θ( n ) ,and hence, the running time of replacement phase is Θ( n ) . The extension phase takes Θ( n ) time. Therefore, the running time ofEncoder W is Θ( n ) . Decoder W does the reverse procedure of Encoder W, and therefore, the running time is also Θ( n ) . Eventhough Encoder W offers lower redundancy than Encoder S, it suffers from more severe error propagation, i.e. during the decodingprocedure, a small number of corrupted bits at the channel output might result in error propagation that could corrupt a largenumber of the decoded bits. On the other hand, Encoder S (Decoder S) encodes (decodes) subblocks separately and concatenatesthe outputs, and hence has a limited error propagation. D. Extension to W ( n, (cid:96), a ) , S ( n, (cid:96), [ a, b ]) , and S ( n, (cid:96), a ) Encoder W can be used to construct SWCCs W ( n, (cid:96), a ) for a < L/ with high efficiency. Especially, when a (cid:28) L , i.e. thereexist a constant p that a < p L for some p < / , we can set p = 1 and use Encoder W to construct W ( n, (cid:96), [ a, (cid:96) ]) .Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) for n = m(cid:96) , this method also provides an efficient encoder for S ( n, (cid:96), [ a, b ]) with only oneredundant bit. This yields a significant improvement in coding redundancy with respect to the Knuth’s balancing technique describedin Section II. Recall that, for codewords of length n = m(cid:96) , the redundancy of Encoder S is Θ( m ) . In contrast, the redundancy ofEncoder W remains one bit for large value of m as long as (cid:96) is sufficient large (refer to the preparation step in Encoder W). Forexample, one may set (cid:96) = Θ(log n ) and m = Θ( n/ log n ) , Encoder W incurs only 1 redundant bit.In addition, recall that S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, b ]) , therefore, for a < (cid:96)/ , Encoder W can also be used to construct SECCs S ( n, L, a ) by setting p = 1 . Similarly, Encoder W can be easily modified to handle the case a = (cid:96)/ .IV. E RROR -C ORRECTION C ODES
In this section, we combine the previous constructions of Encoder S and Encoder W with error correction constraints. The outputcodewords satisfy the weight constraint and are capable of correcting multiple substitution errors. In this work, we assume thatthe distance between two errors is at least (cid:96) . The intuition behind this assumption is that, since the energy constraint (or weightconstraint) is guaranteed over every subblocks (or windows) of length (cid:96) , the probability of having multiple errors in a subblock orwindow is small. A similar model correcting single deletion or single insertion over subblocks has been studied by Abroshan etal. [28]. In this work, we impose the Hamming distance constraint and the codebooks are capable of correcting substitution errors.We first introduce the
Varshamov-Tenengolts (VT) codes defined by Levenshtein [27] to correct a single substitution.
Definition 7.
The binary VT syndrome of a binary sequence x ∈ { , } n is defined to be Syn( x ) = (cid:80) ni =1 ix i .For a ∈ Z n , let L a ( n ) = { x ∈ { , } n : Syn( x ) = a (mod 2 n ) } . Theorem 8 (Levenshtein [27]) . For a ∈ Z n , the code L a ( n ) can correct a single substitution in linear time. There exists alinear-time decoding algorithm D EC L a : { , } n → L a ( n ) such that the following holds. If c ∈ L a ( n ) and y is the received vectorwith at most one substitution, then D EC L a ( y ) = c . In fact, Levenshtein [27] showed that L a ( n ) can also correct a single deletion or single insertion. A. Construction of SECCs with Error-Correction Capability
In SECC S ( n, (cid:96), [ a, b ]) or S ( n, (cid:96), a ) , each codeword contains m = n/(cid:96) subblocks of length (cid:96) . We simply append the informationof the syndrome of each subblock to the end of each subblock. Note that the redundant part must also satisfy the weight constraint.To do so, we propose a simple method to ensure the redundant part is balanced. The extra redundancy for each subblock is (cid:96) ,and hence, the total redundancy of the encoder is m log 2 (cid:96) . For simplicity, assume that t = log 2 (cid:96) is integer. In the following, wepresent an efficient encoder for S ( n, (cid:96), [ a, b ]) that can correct m substitution errors. For simplicity, we first present the case where a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant (cid:54) p < / < p (cid:54) (cid:96) . This construction can be easily modified to handle other classes ofSECCs (for arbitrary parameters a, b or S ( n, (cid:96), a ) where a (cid:54) (cid:96)/ , refer to Subsection II-C). Preparation phase.
Given n = m(cid:96) , (cid:96) = (cid:96) − (cid:96) − r , k = ( p − p ) (cid:96) , r = (cid:100) log( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) , set S ( k,(cid:96) ) be the set ofindices as defined in Definition 6. We construct a one-to-one correspondence between the indices in S ( k,(cid:96) ) and the r bits balancedsequences. We require (cid:96) to be large enough so that (cid:96) = (cid:96) − (cid:96) − r > . Note that r = O (log (cid:96) ) . Encoder S
ECC .I NPUT : x ∈ { , } m(cid:96) O UTPUT : c (cid:44) E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) , where a (cid:54) p (cid:96), b (cid:62) p (cid:96) and n = m(cid:96) I) Set (cid:96) = (cid:96) − (cid:96) . Use the Encoder S to obtain y = E NC S ( x ) ∈ S ( m(cid:96) , (cid:96) , [ p (cid:96) , p (cid:96) ]) . In other words, each subblockof length (cid:96) = (cid:96) − (cid:96) − r in x is encoded to a subblock of length (cid:96) = (cid:96) − (cid:96) in y (II) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) ) ( y ) • Compute a = Syn( z i ) (mod 2 (cid:96) ) • Set p be the binary representation of a of length log 2 (cid:96) • Set q be the complement of p , i.e. q = p • Set c i = z i pq of length (cid:96) (III) Output c = c c . . . c m Theorem 9.
The Encoder S
ECC is correct. In other words, E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) and is capable of correcting at most m substitution errors for all x with the assumption that the distance between any two errors is at least (cid:96) .Proof. Let c = E NC ECC S ( x ) . We first show that c ∈ S ( n, (cid:96), [ a, b ]) . Since z i = B ( i,(cid:96) ) ( y ) where y ∈ S ( m(cid:96) , (cid:96) , [ p (cid:96) , p (cid:96) ]) , wt( z i ) ∈ [ p (cid:96) , p (cid:96) ] . On the other hand, pq is balanced since q is the complement of p . According to Lemma 1, the i th subblock c i = z i pq satisfy the weight constraint, i.e. wt( c i ) ∈ [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] .It remains to show that each subblock of c can correct a substitution error. To do so, we provide an efficient decoding algorithm.Suppose that we receive a sequence y = y y . . . y m where each subblock y i is of length (cid:96) . For (cid:54) i (cid:54) m , we decode the i thsubblock as follows. Let z i be the suffix of length (cid:96) = (cid:96) − (cid:96) of y i , p be the following log 2 (cid:96) bits, and q be the suffix oflength log 2 (cid:96) . • If q (cid:54) = p , then we conclude that there is an error in the suffix pq , consequently there is no error in z i . The decoder useDecoder S to decode z i . • If q ≡ p then we conclude that there is no error in the suffix pq , consequently there is at most one error in z i . We then useD EC L a ( z i ) to correct z i where a is the integer in Z (cid:96) whose binary representation is p .In conclusion, E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) and is capable of correcting at most m substitution errors with the assumption thatthe distance between any two errors is at least (cid:96) for all x ∈ { , } m(cid:96) . (cid:4) For completeness, we describe the corresponding decoder as follows.
Decoder S
ECC .I NPUT : y ∈ { , } m(cid:96) O UTPUT : x (cid:44) D EC ECC S ( y ) ∈ { , } m(cid:96) (I) For (cid:54) i (cid:54) m Do : • Set y i = B ( i,(cid:96) ) ( y ) • Set z i be the prefix of length (cid:96) = (cid:96) − (cid:96) of y i , p be the following log 2 (cid:96) bits and q be the suffix of length log 2 (cid:96) ,i.e. y i = z i pq • If ( q ≡ p ) Do :(i) Let a ∈ Z (cid:96) whose binary representation is p (ii) Let c i = D EC L a ( z i ) of length (cid:96) = (cid:96) − (cid:96) (iii) Use Decoder S to obtain x i = D EC S ( c i ) of length (cid:96) • If ( q (cid:54) = p ) Do (i) Let c i ≡ z i (ii) Use Decoder S to obtain x i = D EC S ( c i ) of length (cid:96) (II) Output x = x x . . . x m ∈ { , } m(cid:96) Analysis.
Since Encoder S/Decoder S has linear-time encoding/decoding complexity and the error correction decoder for eachsubblock D EC L a ( z i ) also has linear-time complexity, both Encoder S ECC and Decoder S ECC have linear-time complexity. Theredundancy for error-correction in each subblock is (cid:96) . Consequently, the total redundancy for codewords of length n = m(cid:96) is then m ( r + 2 log 2 (cid:96) ) . Recall that r = Θ(1) . Therefore, this encoding method is efficient when the number of subblocks is smallcompared to the length of codeword, i.e. m = Θ(1) , (cid:96) = Θ( n ) or m = o ( n ) . In such cases, the rate of Encoder S ECC approachthe channel capacity for sufficient (cid:96), n , lim n →∞ m ( (cid:96) − (cid:96) − r ) m(cid:96) = lim (cid:96) →∞ (cid:96) − (cid:96) − r(cid:96) = lim (cid:96) →∞ − log 2 (cid:96) + r(cid:96) = 1 . . Construction of SWCCs with Error-Correction Capability In order to combine Encoder W/Decoder W with error-correction capability, we need to make sure that after appending thesyndrome to the end of the information data, any overlapping window of size (cid:96) between two parts does not violate the weightconstraint. Specifically, suppose that x = x x . . . x m ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ] , where x i is of length (cid:96) , and we append the balancedsuffix y i (representing the syndrome of x i ) to the end of x i , any window of size (cid:96) in x i y i and y i x i +1 must not be a forbiddenwindow. The following result is crucial to the method of appending the syndrome in such a way that the weight constraint ispreserved.For constant p , p where (cid:54) p < / < p (cid:54) , let p (cid:48) = 1 / p + 1 / and p = 1 / p + 1 / , and (cid:96) be sufficient that (cid:96) (1 / − p ) (cid:62) (cid:96) + 1 and (cid:96) ( p − / (cid:62) (cid:96) + 1 . Definition 8.
Given two binary sequences of same length x = x x . . . x n and y = y y . . . y n , the interleaved sequence of x and y is defined by x || y (cid:44) x y x y . . . x n y n .For a binary sequence x ∈ { , } n , recall that x denote the complement of x . Clearly, we get x || x is balanced. Lemma 2.
Given x ∈ { , } (cid:96) such that wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and y ∈ { , } m where m (cid:54) log 2 (cid:96) . Set z = y || y . For (cid:54) i (cid:54) (cid:96) ,let u i be the suffix of length ( (cid:96) − i ) of x and v i be the prefix of length i of z . We then have wt( u i v i ) ∈ [ p (cid:96), p (cid:96) ] for (cid:54) i (cid:54) (cid:96) .Proof. For (cid:54) i (cid:54) (cid:96) , we first show that wt( u i v i ) (cid:62) p (cid:96) . Since wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and u i is the suffix of length ( (cid:96) − i ) of x , we get wt( u i ) (cid:62) p (cid:48) (cid:96) − i . On the other hand, we observe that wt( v i ) (cid:62) ( i − / . Hence, wt( u i v i ) (cid:62) ( p (cid:48) (cid:96) − i ) + ( i − / p (cid:48) (cid:96) − ( i + 1) /
2= 1 / p + 1 / (cid:96) − ( i + 1) / (cid:62) / / − p ) (cid:96) − ( i + 1) (cid:124) (cid:123)(cid:122) (cid:125) (cid:62) ] + p (cid:96) (cid:62) p (cid:96). Similarly, we have wt( u i ) (cid:54) p (cid:48) (cid:96) , wt( v i ) (cid:54) i , and hence, wt( u ) (cid:54) p (cid:48) (cid:96) + i = 1 / p + 1 / (cid:96) + i (cid:54) / / − p ) (cid:96) + i (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) + p (cid:96) (cid:54) p (cid:96). In conclusion, wt( u i v i ) ∈ [ p (cid:96), p (cid:96) ] for (cid:54) i (cid:54) (cid:96) . (cid:4) Corollary 2.
Given x ∈ { , } (cid:96) such that wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and y ∈ { , } m where m (cid:54) log 2 (cid:96) . Set z = y || y . Let v be anysubstring of length i of z . Let x (cid:48) = x x be a substring of length (cid:96) − i of x and let u = x vx . We then have wt( u ) ∈ [ p (cid:96), p (cid:96) ] .Proof. Similar to the proof of Lemma 2, we can show that wt( u ) (cid:62) ( p (cid:48) (cid:96) − i ) + ( i − / (cid:62) p (cid:96), and wt( u ) (cid:54) p (cid:48) (cid:96) + i (cid:62) p (cid:96). Therefore, wt( u ) ∈ [ p (cid:96), p (cid:96) ] . (cid:4) Corollary 3.
Let x = x x ∈ W (2 (cid:96), (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] . Let a = Syn( x ) (mod 2 (cid:96) ) and set p be the binary representation of a of length log 2 (cid:96) . Let y = x ( p || p ) x . There is no forbidden window in y , in other words, y ∈ W (2 (cid:96) + 2 log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) .Proof. Consider a window of size (cid:96) of y . We have three following cases. • Case 1. The window includes the suffix of length ( (cid:96) − i ) of x and a prefix of length i of p || p where i (cid:54) (cid:96) . Clearly, itis not a forbidden window, according to Lemma 2. • Case 2. The window is of the form uvw where u is the suffix of length i of x , v ≡ p || p and w is the prefix of length (cid:96) − i − (cid:96) of x . Clearly, it is not a forbidden window, according to Corollary 2. • Case 3. The window includes the suffix of length i of p || p and a prefix of length ( (cid:96) − i ) of x . Similar to case 1, it is not aforbidden window.In conclusion, we have y ∈ W (2 (cid:96) + 2 log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) . (cid:4) In the following, we present efficient encoder/decoder for SWCCs with error-correction capability. For simplicity, we assume that n = m(cid:96) . Recall that p (cid:48) = 1 / p + 1 / and p = 1 / p + 1 / , and (cid:96) (1 / − p ) (cid:62) (cid:96) + 1 and (cid:96) ( p − / (cid:62) (cid:96) + 1 . Encoder W
ECC .I NPUT : x ∈ { , } n − O UTPUT : c (cid:44) E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) I) Use the Encoder W to obtain y = E NC W ( x ) ∈ W ( n, (cid:96), [ p (cid:48) (cid:96) , p (cid:48) (cid:96) ]) . In other words, the Encoder W is constructed based onthe values of p (cid:48) , p (cid:48) . Suppose that y = y y . . . y m where y i ∈ { , } (cid:96) ∩ W ( (cid:96), (cid:96), [ p (cid:48) (cid:96) , p (cid:48) (cid:96) ]) for (cid:54) i (cid:54) m .(II) For (cid:54) i (cid:54) m Do: • Compute a i = Syn( y i ) (mod 2 (cid:96) ) • Set p i be the binary representation of a i of length log 2 (cid:96) • Set q i = p i || p of length (cid:96) and q i is balanced • Set c i = y i q i of length (cid:96) + 2 log 2 (cid:96) (III) Output c = c c . . . c m Theorem 10.
The Encoder W
ECC is correct. In other words, E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , which can correct upto m substitution errors with the assumption that the distance between any two errors is at least (cid:96) for all x ∈ { , } n − . Theredundancy of Encoder W ECC is m log 2 (cid:96) (bits).Proof. Let c = E NC ECC W ( x ) . We first show that c ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , in other words, there is no forbidden windowin c . Since y i ∈ W ( (cid:96), (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ]) ⊂ W ( (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , we only need to show that any window of size (cid:96) in y i p i q i y i +1 is not aforbidden window for (cid:54) i (cid:54) m − . This follows directly from Corollary 3.It remains to show that c can correct m substitution errors. To do so, we provide an efficient decoding algorithm and thisalgorithm is similar to the case of Encoder/Decoder S error as discussed in the earlier section. Suppose that we receive a sequence c (cid:48) = c (cid:48) c (cid:48) . . . c (cid:48) m where each subblock c (cid:48) i is of length (cid:96) + 2 log 2 (cid:96) . For (cid:54) i (cid:54) m , we decode the i th subblock as follows. Let z i be the prefix of length (cid:96) of c (cid:48) i and q i be the suffix of length (cid:96) , and q i = p i || p (cid:48) i • If p (cid:48) i (cid:54) = p i , then we conclude that there is an error in q i , consequently there is no error in z i . The decoder use Decoder W todecode z i . • If p (cid:48) i ≡ p i then we conclude that there is no error in the suffix q i , consequently there is at most one error in z i . We then useD EC L a ( z i ) to correct z i where a is the integer in Z (cid:96) whose binary representation is p i .In conclusion, E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) and is capable of correcting at most m substitution errors with theassumption that the distance between any two errors is at least (cid:96) for all x ∈ { , } n − . (cid:4) For completeness, we describe the corresponding decoder as follows.
Decoder W
ECC .I NPUT : y ∈ { , } n +2 m log 2 (cid:96) ,O UTPUT : x (cid:44) D EC ECC W ( y ) ∈ { , } n − (I) For (cid:54) i (cid:54) m Do : • Set y i = B ( i,(cid:96) +2 log 2 (cid:96) ) ( y ) • Set z i be the prefix of length (cid:96) of y i , q i be the following (cid:96) bits and q i = p i || p (cid:48) i • If ( p (cid:48) i ≡ p i ) Do :(i) Let a ∈ Z (cid:96) whose binary representation is p i (ii) Let c i = D EC L a ( z i ) of length (cid:96) • If ( p (cid:48) i (cid:54) = p i ) Do (i) Let c i ≡ z i (II) Let c = c c . . . c m ∈ { , } n ∩ W ( n, (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ]) (III) Use Decoder W to obtain x = D EC W ( c ) of length n − (IV) Output x Analysis.
Since Encoder W/Decoder W has linear-time encoding/decoding complexity and the error correction decoder for eachsubblock also has linear-time complexity, both Encoder W ECC and Decoder W ECC have linear-time complexity. The total redun-dancy of Encoder W is m log 2 (cid:96) , which is slightly less than the redundancy of Encoder S. This encoding method is efficientwhen the number of subblocks is small compared to the length of codeword, i.e. m = o ( n ) . In such cases, the rate of Encoder W ECC approaches the channel capacity for sufficient large (cid:96), n , lim n →∞ n − n + 2 m log 2 (cid:96) = 1 . V. C
ONCLUSION
We have presented novel and efficient encoders that translate source binary data to codewords in SECCs, SWCCs, boundedSECCs, and bounded SWCCs. Our coding methods, based on Knuth’s balancing technique and sequence replacement technique,incur low redundancy and have linear-time complexity. For certain codes parameters, our methods incur only one redundant bit.We also imposed minimum distance constraint to the designed codewords for error correction capability.
EFERENCES[1] K. A. S. Immink, “Runlength-limited sequences,”
Proc. IEEE , vol. 78, no. 11, pp. 1745-1759, Nov. 1990.[2] K. A. S. Immink,
Codes for Mass Data Storage Systems , Second Edition, ISBN 90-74249-27-2, Shannon Foundation Publishers, Eindhoven, Netherlands,2004.[3] L. R. Varshney, “Transporting information and energy simultaneously,”
Proc. 2008 IEEE Int. Symp. Inf. Theory , Jul. 2008, pp. 1612-1616.[4] A. Tandon, M. Motani, and L. R. Varshney, “Subblock-constrained codes for real-time simultaneously energy and information transfer,”
IEEE TransactionInformation Theory , vol. 62, no. 7, pp. 4212-4227, Jul. 2016.[5] T. Y. Wu, A. Tandon, L. R. Varshney, and M. Motani, “Skip-Sliding Window Codes”,
Proc. 2018 IEEE Int. Symp. Inf. Theory .[6] S. Zhao, “A serial concatenation-based coding scheme for dimmable visible light communication systems,”
IEEE Commun. Lett. , vol. 20, no. 10, pp. 1951-1954,Oct. 2016.[7] Y. M. Chee, Z. Cherif, J. L. Danger, S. Guilley, H. M. Kiah, J. L. Kim, P. Sole, and X. Zhang, “Multiply constant-weight codes and the reliability of loopphysically unclonable functions,”
IEEE Transaction Information Theory , vol. 60, no. 11, pp. 7026-7034, No. 2014.[8] A. Tandon, H. M. Kiah, and M. Motani, “Binary subblock energy-constrained codes: bounds on code size and asymptotic rate”,
Proc. 2017 IEEE Int. Symp.Inf. Theory , 2017.[9] H. M. Kiah, A. Tandon, and M. Motani. “Generalized Sphere-Packing Bound for Subblock-Constrained Codes”,
Proc. 2019 IEEE Int. Symp. Inf. Theory , Jul.2019.[10] A. Tandon, H. M. Kiah , and M. Motani , “Bounds on the Size and Asymptotic Rate of Subblock-Constrained Codes”, IEEE Transactions on InformationTheory, Vol. 64, No. 10, October 2018.[11] ´A. I. Barbero, E. Rosnes, G. Yang, and Ø. Ytrehus, “Constrained codes for passive RFID communication,” in
Proc. 2011 Inf. Theory Appl. Workshop , Feb.2011.[12] A. M. Fouladgar, O. Simeone, and E. Erkip, “Constrained codes for joint energy and information transfer,”
IEEE Trans. Commun. , vol. 62, no. 6, pp. 2121-2131,Jun. 2014.[13] K. A. S. Immink, and K. Cai, “Properties and constructions of energy-harvesting sliding-window constrained codes,”
IEEE Communications Letters , May2020.[14] R. Gabrys, E. Yaakobii, and O. Milenkovic, “Codes in the Damerau Distance for Deletion and Adjacent Transposition Correction”,
IEEE Trans. Inform.Theory , vol. 64, no. 4, Apr. 2018.[15] K. Cai, Y. M. Chee, R. Gabrys, H. M. Kiah, and T. T. Nguyen, “Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage”, arXiv,arXiv:1910.06501 , Oct. 2019.[16] R. Gabrys, H. M. Kiah, A. Vardy, E. Yaakobi, and Y. Zhang, “Locally Balanced Constraints”,
Proc. IEEE Int. Symp. Inf. Theory (ISIT 2020) , pp. 664-669,Jun. 2020.[17] D. E. Knuth, “Efficient Balanced Codes”,
IEEE Trans. Inform. Theory , vol. IT-32, no. 1, pp. 51-53, Jan 1986.[18] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko, “Balancing sets of vectors”,
IEEE Trans. Inf. Theory , vol. IT-34, no. 1, pp. 128-130, Jan.1988.[19] V. Skachek and K. A. S. Immink, “Constant Weight Codes: An Approach Based on Knuth’s Balancing Method”,
IEEE Journal on Selected Areas inCommunications , vol. 32, No. 5, May 2014.[20] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new balanced codes,”
IEEE Trans. Inf. Theory , vol. IT-42, pp. 790-802, May 1996.[21] K. A. S. Immink and K. Cai, “Properties and Constructions of Constrained Codes for DNA-Based Data Storage,”
IEEE Access , vol. 8, pp. 49523- 49531,Mar. 2020.[22] W. Hoeffding, “Probability inequalities for sums of bounded random variables”,
Journal of the American Statistical Association , vol. 58, no. 301, pp. 13–30.[23] A. J. de Lind van Wijngaarden and K. A. S. Immink, “Construction of Maximum Run-Length Limited Codes Using Sequence Replacement Techniques,”
IEEE Journal on Selected Areas of Communications , vol. 28, pp. 200-207, 2010.[24] O. Elishco, R. Gabrys, M. Medard, and E. Yaakobi, “Repeated-Free Codes”,
Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Paris, France, 2019.[25] C. Schoeny, A. Wachter-Zeh, R. Gabrys, and E. Yaakobi, “Codes correcting a burst of deletions or insertions,”
IEEE Trans. Inform. Theory , vol. 63, no. 4,pp. 1971-1985, 2017.[26] K. A. S. Immink, and K. Cai, “Design of Capacity-Approaching Constrained Codes for DNA-Based Data Storage Systems,”
IEEE Communications Letters ,vol. 22, no. 2, pp. 224-227, 2018.[27] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals”,
Doklady Akademii Nauk SSSR , vol. 163, no. 4, pp. 845-848, 1965.[28] M. Abroshan, R. Venkataramanan, and A. G. i Fabregas, “Coding for segmented edit channels”,
IEEE Trans. Inf. Theory , vol. 64, pp. 3086-3098, 2017.[29] T. T. Nguyen, K. Cai, and K. A. S. Immink, “Binary Subblock Energy-Constrained Codes: Knuth’s Balancing and Sequence Replacement Techniques”, toappear,