[PDF] Efficient Design of Subblock Energy-Constrained Codes and Sliding Window-Constrained Codes

Abstract

The subblock energy-constrained codes (SECCs) and sliding window-constrained codes (SWCCs) have recently attracted attention due to various applications in communcation systems such as simultaneous energy and information transfer. In a SECC, each codewod is divided into smaller non-overlapping windows, called subblocks, and every subblock is constrained to carry sufficient energy. In a SWCC, the energy constraint is enforced over every window. In this work, we focus on the binary channel, where sufficient energy is achieved theoretically by using relatively high weight codes, and study SECCs and SWCCs under more general constraints, namely bounded SECCs and bounded SWCCs. We propose two methods to construct such codes with low redundancy and linear-time complexity, based on Knuth's balancing technique and sequence replacement technique. For certain codes parameters, our methods incur only one redundant bit. We also impose the minimum distance constraint for error correction capability of the designed codes, which helps to reduce the error propagation during decoding as well.

Full PDF

11 Efﬁcient Design of Subblock Energy-ConstrainedCodes and Sliding Window-Constrained Codes

Tuan Thanh Nguyen, Kui Cai, and Kees A. Schouhamer Immink

Abstract

The subblock energy-constrained codes (SECCs) and sliding window-constrained codes (SWCCs) have recently attracted attentiondue to various applications in communication systems such as simultaneous energy and information transfer. In a SECC, each codewordis divided into smaller non-overlapping windows, called subblocks, and every subblock is constrained to carry sufﬁcient energy. In aSWCC, however, the energy constraint is enforced over every window. In this work, we focus on the binary channel, where sufﬁcientenergy is achieved theoretically by using relatively high weight codes, and study SECCs and SWCCs under more general constraints,namely bounded SECCs and bounded SWCCs. We propose two methods to construct such codes with low redundancy and linear-timecomplexity, based on Knuth’s balancing technique and sequence replacement technique. These methods can be further extended toconstruct SECCs and SWCCs. For certain codes parameters, our methods incur only one redundant bit. We also impose the minimumdistance constraint for error correction capability of the designed codes, which helps to reduce the error propagation during decodingas well.

I. I

NTRODUCTION

Constrained coding has been used widely in various communication and storage systems. For example, to avoid detection errorsdue to inter-symbol interference and synchronization errors in magnetic and optical storage, runlength-limited codes (RLLs) areemployed to restrict any run of zeros between consecutive ones [1], [2]. Recently, the subblock energy-constrained codes (SECCs)and sliding window-constrained codes (SWCCs) have been shown as suitable candidates for providing simultaneous energy andinformation transfer from a powered transmitter to an energy-harvesting receiver [3]–[12]. In this scenario, the receiver uses thesame received signal both for decoding information and for harvesting energy which is to power the receiver’s circuitry. In 2008,Varshney [3] characterized the tradeoff between reliable communication and delivery of energy at the receiver by using a generalcapacity-power function, where transmitted sequences were constrained to contain sufﬁcient energy. In this work, we focus on thebinary channel, where on-off keying is employed, and bit 1 (bit 0) denotes the presence (absence) of a high energy signal. As such,sufﬁcient energy is achieved theoretically by using relatively high weight codes.Recently, Tandon et al. [4] demonstrated that imposing only an energy constraint over the whole transmitted sequence might notbe sufﬁcient. It is important to avoid sequences which carry limited energy over long duration, and consequently, preventing energyoutage at a receiver having ﬁnite energy storage capability. In order to regularize the energy content in the signal, two classesof energy-constrained codes, namely SECCs and SWCCs, were suggested [4], [5]. Formally, in a binary SECC, each codewordis divided into smaller non-overlapping window, called subblocks , and every subblock is constrained to have sufﬁcient number ofones. In contrast, a binary SWCC restricts the number of ones over every window of consecutive symbols (see Figure 1). Thisapproach has been investigated in [11]–[13]. SWCCs have been further studied for other applications of error-correction codes in[14], [15]. In fact, the subblock energy constraint is weaker than the latter, and even if every subblock in a codeword c carriessufﬁcient energy, there might still be a subsequence in c that carries limited energy over long duration (see Example 1). In contrast,the sliding-window constraint enables all codewords to carry sufﬁcient energy over every duration, which meets real-time deliveryrequirements, but also reduces the number of valid codewords and therefore the information capacity. In this work, we providesome bounds for SWCCs and show that if the length of each duration satisﬁes certain constraints, there exist codes whose ratesapproach capacity. In such cases, we design an efﬁcient method to construct SWCCs with only one redundant bit.Furthermore, we study SECCs and SWCCs under more general constraints, namely bounded SECCs and bounded SWCCs . Theadditional constraint restricts the energy in every subblock in SECCs (or every window in SWCCs) to be below a given threshold,consequently preventing energy outage at a receiver having ﬁnite energy storage capability (see Figure 1). Throughout this paper, wepropose two methods for constructing bounded SECCs and bounded SWCCs, based on Knuth’s balancing technique and sequencereplacement technique . The methods can be extended to construct SECCs and SWCCs as well. We further combine these codesefﬁciently with error correction codes (ECCs), which also helps to reduce error propagation of the designed codes during decoding.Before we present the main results of the paper, we go through certain notations and then highlight the major contributions of thiswork.

Tuan Thanh Nguyen and Kui Cai are with the Singapore University of Technology and Design, Singapore 487372 (email: { tuanthanh nguyen,cai kui } @sutd.edu.sg).Kees A. Schouhamer Immink is with the Turing Machines Inc, Willemskade 15d, 3016 DK Rotterdam, The Netherlands (email: [email protected]).This paper was presented in part at the 2020 Proceedings of the IEEE International Symposium on Information Theory [29]. a r X i v : . [ c s . I T ] S e p . Notations Given two binary sequences x = x . . . x m and y = y . . . y n , the concatenation of the two sequences is deﬁned by xy (cid:44) x . . . x m y . . . y n . For a binary sequence x , we use wt( x ) to denote the weight of x , i.e the number of ones in x . We use x to denote the complementof x . For example, if x = 00111 then wt( x ) = 3 and x = 11000 .Throughout this work, we denote the codeword length by n , the subblock (or window) length by (cid:96) where (cid:96) (cid:54) n . In SECCs, wealso require n = m(cid:96) for some positive integer m . Deﬁnition 1.

For (cid:54) a (cid:54) (cid:96) , we use S ( n, (cid:96), a ) to denote the set of all codewords with length n = m(cid:96) , and the weight in eachsubblock is at least a , and we use W ( n, (cid:96), a ) to denote the set of all codewords with length n (not necessary a multiple of (cid:96) ), andthe weight in every window of size (cid:96) is at least a . Deﬁnition 2.

For (cid:54) a < b (cid:54) (cid:96) , we use S ( n, (cid:96), [ a, b ]) to denote the set of all codewords with length n = m(cid:96) , and the weight ineach subblock is at least a and at most b . Similarly, W ( n, (cid:96), [ a, b ]) denotes the set of all codewords with length n and the weightin every window of size (cid:96) is at least a and at most b . Proposition 1.

For all (cid:54) a < b (cid:54) (cid:96) , we have(i) W ( n, (cid:96), a ) ⊂ S ( n, (cid:96), a ) , W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) ,(ii) W ( n, (cid:96), a ) ≡ W ( n, (cid:96), [ a, (cid:96) ]) , S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, (cid:96) ]) . Given < (cid:96) (cid:54) n , (cid:54) a < b (cid:54) (cid:96), the capacity of those constraint channels are deﬁned by: c S ( (cid:96), a ) (cid:44) lim n →∞ /n log | S ( n, (cid:96), a ) | , c S ( (cid:96), [ a, b ]) (cid:44) lim n →∞ /n log | S ( n, (cid:96), [ a, b ]) | , c W ( (cid:96), a ) (cid:44) lim n →∞ /n log | W ( n, (cid:96), a ) | , c W ( (cid:96), [ a, b ]) (cid:44) lim n →∞ /n log | W ( n, (cid:96), [ a, b ]) | . The capacity c W ( (cid:96), a ) is studied and determined for certain values of (cid:96) and a in our companion paper [13]. A special class ofbounded SWCCs, namely locally balanced constraints , was introduced in [16] and the capacity c W ( (cid:96), [ a, b ]) was also studied when a = (cid:96)/ − (cid:15), b = (cid:96)/ (cid:15) for (cid:15) > . In general, to achieve high information capacity, the sufﬁcient values for a, b are a (cid:54) p (cid:96) and b (cid:62) p (cid:96) for some constants (cid:54) p < / < p (cid:54) . In this work, not only are we interested in constructing large codes, we desireefﬁcient encoders that map arbitrary binary messages into these codes. Deﬁnition 3.

For (cid:54) a (cid:54) (cid:96) (cid:54) n , and (cid:54) r (cid:54) n , an encoder E NC : { , } n − r → { , } n is a ( n, (cid:96), a ) - subblock energy-constrained encoder with r bits of redundancy if E NC ( x ) ∈ S ( n, (cid:96), a ) for all x ∈ { , } n − r . The rate of the encoder is computed by ( n − r ) /n = 1 − r/n . For (cid:54) a < (cid:96)/ < b (cid:54) (cid:96) (cid:54) n , the ( n, (cid:96), [ a, b ]) - bounded subblock energy-constrained encoder , ( n, (cid:96), a ) - slidingwindow-constrained encoder , and ( n, (cid:96), [ a, b ]) - bounded sliding window-constrained encoder are deﬁned similarly.For each constraint, our design objectives include low redundancy (equivalently, high information rate) and low complexity ofthe encoding/decoding algorithms. In Section II and Section III, for certain codes parameters, the rate of our encoders approachesthe channel capacity. Deﬁnition 4.

For n, (cid:96) > , n = m(cid:96) , a sequence x = x x . . . x n ∈ { , } n is divided into m subblocks of size (cid:96) where the i thsubblock is denoted by B ( i,(cid:96) ) ( x ) , and B ( i,(cid:96) ) ( x ) = x ( i − (cid:96) +1 . . . x i(cid:96) for (cid:54) i (cid:54) m . On the other hand, the i th window of size (cid:96) of x , denoted by w ( i,(cid:96) ) ( x ) , is deﬁned by w ( i,(cid:96) ) ( x ) = x i . . . x i + (cid:96) − for (cid:54) i (cid:54) n − (cid:96) + 1 . Example 1.

Let n = 18 , (cid:96) = 6 , m = 3 , a = 2 , b = 5 . Consider a sequence x = 001111110000011001 . The subblocks of x aredeﬁned as follows. x = 001111 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . We verify that the weight in each subblock is within [2 , , and hence, x ∈ S (18 , , [2 , . However, x / ∈ W (18 , , [2 , , sincethere are windows of size six, for example w (3 , and w (9 , , that violate the weight constraint, x = 00111111 (cid:124) (cid:123)(cid:122) (cid:125) w (3 , (cid:124) (cid:123)(cid:122) (cid:125) w (9 , . B. Our Contributions

In this work, we design efﬁcient methods of mapping arbitrary users’ data to codewords in SECCs, SWCCs, bounded SECCs,and bounded SWCCs. Formally, for (cid:54) a < b (cid:54) (cid:96) ,ig. 1: SECCs, SWCCs, bounded SECCs, and bounded SWCCs.(i) In Section II, we propose an efﬁcient encoder for bounded SECCs S ( n, (cid:96), [ a, b ]) using the Knuth’s balancing technique. Notethat S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, (cid:96) ]) , and hence, the method can be applied to construct efﬁcient encoder for SECCs S ( n, (cid:96), a ) as well.Particularly, we extend the Knuth’s balancing technique for balanced codes, i.e. a = b = (cid:96)/ , to construct S ( n, (cid:96), [ a, b ]) for aspecial case when a = p (cid:96), b = p (cid:96), (cid:54) p < / < p (cid:54) , and then generalize this technique for arbitrary (cid:54) a < b (cid:54) (cid:96) .(ii) In Section III, we ﬁrst study the size of SWCCs W ( n, (cid:96), a ) and bounded SWCCs W ( n, (cid:96), [ a, b ]) . When a = p (cid:96), b = p (cid:96), forsome (cid:54) p < / < p (cid:54) , we show that when the window size satisﬁes certain constraints, the code size is at least n − .We then propose efﬁcient encoders for W ( n, (cid:96), a ) and SWCCs W ( n, (cid:96), [ a, b ]) by using the sequence replacement technique.For certain values of a, b, (cid:96) , our method incurs only one redundant bit.(iii) In Section IV, we study these codes with given error correction capability. Particularly, we construct codes that can correctmultiple errors with the assumption that the distance between any two errors is at least (cid:96) . The intuition behind this assumptionis that when the energy constraint is enforced over every window of size (cid:96) , the probability of having error is minimized overevery window. We consider the worst case scenario when there is at most one error over every window of size (cid:96) .II. SECC S AND B OUNDED

SECC S In this section, we propose simple coding scheme to construct S ( n, (cid:96), a ) and S ( n, (cid:96), [ a, b ]) . We are interested in the case wherethe number of subblocks is constant, i.e. m = Θ(1) , (cid:96) = Θ( n ) . Particularly, we ﬁrst modify the Knuth’s balancing technique toconstruct S ( n, (cid:96), [ a, b ]) when there exist two constant numbers p , p , (cid:54) p < / < p (cid:54) such that a (cid:54) p (cid:96), b (cid:62) p (cid:96) . We thenextend this method to construct S ( n, (cid:96), [ a, b ]) and S ( n, (cid:96), a ) for arbitrary a, b . A. Maximum Information Rate

The following result is immediate.

Proposition 2.

For n = m(cid:96), (cid:54) a < b (cid:54) (cid:96) , we have | S ( n, (cid:96), [ a, b ]) | = (cid:16)(cid:80) bi = a (cid:0) (cid:96)i (cid:1)(cid:17) m and | S ( n, (cid:96), a ) | = (cid:16)(cid:80) (cid:96)i = a (cid:0) (cid:96)i (cid:1)(cid:17) m . In fact, we are able to show that the sizes of S ( n, (cid:96), a ) and S ( n, (cid:96), [ a, b ]) , under certain conditions of a, b, (cid:96) , are at least n − ,and therefore, the channel capacity in such cases is 1. Theorem 1.

For all (cid:54) p < / < p (cid:54) and a (cid:54) p (cid:96), b (cid:62) p (cid:96) , let c = min { / − p , p − / } . For n, (cid:96) that (1 /c ) log e n (cid:54) (cid:96) (cid:54) n , we have | S ( n, (cid:96), [ a, b ]) | ≥ n − . We defer the proof of Theorem 1 to Section III, Theorem 4. In fact, Theorem 4 presents a stronger result that under the assumptionof n, (cid:96), a, b, as mentioned in Theorem 1, we have | W ( n, (cid:96), [ a, b ]) | ≥ n − . Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) , Theorem 1 is thenproved. B. Efﬁcient Construction of S ( n, (cid:96), [ a, b ]) In this section, we modify the Knuth’s balancing technique to construct S ( n, (cid:96), [ a, b ]) . Knuth’s balancing technique is a linear-timealgorithm that maps a binary message x to a balanced binary word y of the same length by ﬂipping the ﬁrst t bits of x [17]. Thecrucial observation demonstrated by Knuth is that such an index t always exists and t is commonly referred to as a balancingndex . To represent such a balancing index, Knuth appends y with a short balanced sufﬁx p of length log n and hence, a lookuptable of size n is required. Modiﬁcations of the generic scheme are discussed in [18]–[21]. Deﬁnition 5.

For a binary sequence x ∈ { , } n and (cid:54) t (cid:54) n , let f t ( x ) denote the binary sequence obtained by ﬂipping the ﬁrst t bits of x . Example 2.

Let x = 001111 ∈ { , } . We have f ( x ) = 101111 , f ( x ) = 111111 , f ( x ) = 110111 , f ( x ) = 110011 , f ( x ) =110001 , and f ( x ) = 110000 . Hence, t = 5 is the unique balancing index of x . In general, the balancing index may not be unique.For example, consider y = 001100 . We observe that both f ( y ) = 101100 and f ( y ) = 110010 are balanced, therefore, both t = 1 and t = 5 are balancing indices of y .We now extend the Knuth’s method to construct S ( n, (cid:96), [ a, b ]) when a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant p , p , (cid:54) p < /

Let n be even and set [ n ] = { , , , . . . , n } . For arbitrary < k (cid:54) n , a walk of size k in [ n ] is the set of indices S ( k,n ) (cid:44) { , n } ∪ { ik : i (cid:62) and ik < n } . Theorem 2.

Given (cid:96) even and (cid:54) p < / < p (cid:54) . Let k = ( p − p ) (cid:96) . For an arbitrary binary sequence x ∈ { , } (cid:96) , thereexists an index t in the set S ( k,(cid:96) ) , such that the weight of f t ( x ) is within [ p (cid:96), p (cid:96) ] .Proof. In the trivial case, when the weight of x satisﬁes the constraint, i.e. wt( x ) ∈ [ p (cid:96), p (cid:96) ] , then we can select t = 0 ∈ S ( k,(cid:96) ) .Otherwise, assume that wt( x ) / ∈ [ p (cid:96), p (cid:96) ] , and without loss of generality, assume that wt( x ) < p (cid:96) (cid:54) (cid:96)/ . Since wt( x ) < (cid:96)/ ,we have wt( f (cid:96) ( x )) > (cid:96)/ . Now, for k = ( p − p ) (cid:96) , consider the list of indices, t = k, t = 2 k, and t i = ik ∈ S ( k,n ) . Since f t i ( x ) and f t i +1 ( x ) differ at most k positions, and wt( x ) < (cid:96)/ , wt( f (cid:96) ( x )) > (cid:96)/ , there must be an index t ∈ S ( k,(cid:96) ) such that p (cid:96) (cid:54) wt( f t ( x )) (cid:54) p (cid:96) . (cid:4) Example 3.

Let x = 110000000000 ∈ { , } , wt( x ) = 2 . Let p = 1 / and p = 2 / , i.e. we want a codeword that has weightin [4 , . We compute k = ( p − p ) (cid:96) = 4 . The set S ( k,(cid:96) ) = { , , , } . We can verify that f ( x ) = 001100000000 , f ( x ) = 001111110000 , f ( x ) = 001111111111 . Hence, for t = 8 ∈ S ( k,(cid:96) ) , we get wt( f t ( x )) ∈ [ p (cid:96), p (cid:96) ] . Lemma 1.

Given n > and (cid:54) p < / < p (cid:54) . Let x ∈ { , } n such that p n (cid:54) wt( x ) (cid:54) p n . For any binary balanceword y ∈ { , } m , we have p ( n + m ) (cid:54) wt( xy ) (cid:54) p ( n + m ) .Proof. We have wt( xy ) = wt( x ) + m/ . Since (cid:54) p < / < p (cid:54) , it implies that p m < m/ < p m .As given, p n (cid:54) wt( x ) (cid:54) p n , we then conclude that p ( n + m ) (cid:54) wt( xy ) (cid:54) p ( n + m ) . (cid:4) For constant (cid:54) p < / < p (cid:54) , k = ( p − p ) (cid:96) , the size of S ( k,(cid:96) ) is at most (cid:98) / ( p − p ) (cid:99) +1 , which is independent of (cid:96) . Let r = (cid:100) log ( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) . To encode an arbitrary binary sequence x to a codeword in S ( n, (cid:96), [ a, b ]) , where a (cid:54) p (cid:96), b (cid:62) p (cid:96) ,we divide x into subblocks of length N = (cid:96) − r . We then encode each subblock and concatenate the outputs. For each subblock, wesimply ﬁnd the smallest index t in S ( k,(cid:96) − r ) such that y = f t ( x ) satisﬁes the weight constraint. According to Theorem 2, such indexalways exists. To represent such index, we also append a short balanced sufﬁx, and so, a lookup table of size log | S ( k,(cid:96) − r ) | = r isrequired.For completeness, we describe the formal encoder/decoder of S ( n, (cid:96), [ a, b ]) as follows. Preparation phase.

Given n = m(cid:96) , (cid:54) p < / < p (cid:54) , set k = ( p − p ) (cid:96) and r = (cid:100) log( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) . Set S ( k,(cid:96) − r ) be the set of indices as deﬁned in Deﬁnition 6. We construct a one-to-one correspondence between the indices in S ( k,(cid:96) − r ) and the r bits balanced sequences. Encoder S I NPUT : x ∈ { , } m ( (cid:96) − r ) O UTPUT : y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) − r ) ( x ) • Search for the ﬁrst index t in S ( k,(cid:96) − r ) , such that wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] • Let p i be the r bits balanced sequence representing t • Set y i = f t ( z i ) p i (II) Finally, we output y = y y . . . y m heorem 3. The Encoder S is correct. In other words, E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) for all x ∈ { , } m ( (cid:96) − r ) .Proof. To show y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) , we need to verify that the weight of every subblock of y is in [ a, b ] . From EncoderS, the i th subblock is y i = f t ( z i ) p i . Since wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] and p i is a balanced word of length r , according toLemma 1, wt( y i ) ∈ [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] . (cid:4) Decoder S I NPUT : y ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) O UTPUT : x = D EC S ( y ) ∈ { , } m ( (cid:96) − r ) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) ) ( y ) • Let p i be the sufﬁx of length r of z i that corresponds to an index t ∈ S ( k,(cid:96) − r ) ’ • Obtain z (cid:48) i by removing p i from z i • Set x i = f t ( z (cid:48) i ) (II) Finally, we output x = x x . . . x m Alternatively, for each subblock, the index can be encoded/decoded in linear-time without the look-up table for S ( k,(cid:96) − r ) . However,the redundancy increases from r to r and the set of indices is S ( k,(cid:96) − r ) . Recall that | S ( k,(cid:96) − r ) | = | S ( k,(cid:96) − r ) | = r . The modiﬁedEncoder S’ can be constructed as follows. We skip the detail of the corresponding Decoder S’. Encoder S’ .I NPUT : x ∈ { , } m ( (cid:96) − r ) O UTPUT : y = E NC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) where a (cid:54) p (cid:96), b (cid:62) p (cid:96) (I) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) − r ) ( x ) • Search for the ﬁrst index t in S ( k,(cid:96) − r ) , such that wt( f t ( z i )) ∈ [ p ( (cid:96) − r ) , p ( (cid:96) − r )] • Let

Γ = τ τ . . . τ r be the binary representation of the rank of the index t in S ( k,(cid:96) − r ) • Set p i = ΓΓ of length r , where Γ is the complement of Γ and set y i = f t ( z i ) p i (II) Finally, we output y = y y . . . y m Analysis.

The redundancy for encoding each subblock in Encoder S (or Encoder S’) is r = (cid:100) log ( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) (or r ),which is independent of (cid:96) . In other words, for constant p , p , r = Θ(1) . Consequently, the total redundancy for codewords oflength n = m(cid:96) is then mr = Θ( m ) . Therefore, this encoding method is efﬁcient for large (cid:96) and the number of subblocks is small,compared to the length of codeword, i.e. m = Θ(1) , (cid:96) = Θ( n ) . In such cases, the rate of Encoder S is ( n − mr ) /n = 1 − mr/n → ,and similarly, the rate of Encoder S’ is − mr/n → , both approaching the channel capacity. Indeed, the same argument applieswhen m = o ( n ) . It is easy to verify that the complexity of Encoder/Decoder S (or S’) are linear in the codeword length. C. Extension to S ( n, (cid:96), a ) We can modify Encoder S to construct S ( n, (cid:96), a ) or S ( n, (cid:96), [ a, b ]) for arbitrary (cid:54) a < (cid:96)/ < b (cid:54) (cid:96) .For S ( n, (cid:96), [ a, b ]) , we let k = b − a and the set S ( k,(cid:96) ) is of size at most (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1 . The redundancy to encode each subblockof size (cid:96) is then (cid:100) log( (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1) (cid:101) . The efﬁciency of the encoder is high when a = o ( (cid:96) ) or b = Θ( (cid:96) ) . In such cases, since b − a = Θ( (cid:96) ) , we have (cid:100) log( (cid:98) (cid:96)/ ( b − a ) (cid:99) + 1) (cid:101) = Θ(1) .Particularly, for SECCs S ( n, (cid:96), a ) when a < (cid:96)/ , Encoder S incurs only one redundant bit for each subblock. The simple ideais as follows. If the i th subblock has weight w < a < (cid:96)/ , the encoder simply ﬂips the whole subblock (or equivalently take itscomplement), the weight of the complement is then w (cid:48) > (cid:96)/ > a . The encoder appends one bit p = 1 (or ) if the ﬂipping actionis needed (or not needed). This classic code is known as the polarity bit code [2]. Example 4.

Let n = 21 , (cid:96) = 7 , m = 3 and a = 3 . Suppose the source data is x = 110000011001111100 ∈ { , } . The encoderchecks every subblock of length 6 and outputs c ∈ { , } where, c = 0011111 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . To decode c , the decoder also checks every subblock of length 7, if the last bit is 1, it ﬂips the preﬁx. We then obtain the sourcedata x , x = 110000 (cid:124) (cid:123)(cid:122) (cid:125) B (1 , (cid:124) (cid:123)(cid:122) (cid:125) B (2 , (cid:124) (cid:123)(cid:122) (cid:125) B (3 , . II. SWCC

S AND B OUNDED

SWCC S In this section, we propose a simple coding scheme to construct W ( n, (cid:96), [ a, b ]) and W ( n, (cid:96), a ) by using the sequence replacementtechnique. Particularly, to construct W ( n, (cid:96), [ a, b ]) when a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant (cid:54) p < / < p (cid:54) , our methodincurs only one redundant bit. Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) for n = m(cid:96) , this method also provides an efﬁcient encoderfor S ( n, (cid:96), [ a, b ]) with only one redundant bit. This yields a signiﬁcant improvement in coding redundancy with respect to theKnuth’s balancing technique described in Section II. Note that the efﬁciency of Encoder S is high when the number of subblocksis constant, i.e. m = Θ(1) , (cid:96) = Θ( n ) , since the redundancy grows linearly with m . In this section, we show that there exists anefﬁcient encoder when m is a function of n , as long as the size of subblocks (or windows) satisﬁes certain constraints. A. Maximum Information Rate

The following result implies that there exist such codes with size at least n − and hence, approaching the channel capacity. Theorem 4.

For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For a (cid:54) p (cid:96), b (cid:62) p (cid:96), and (1 /c ) log e n (cid:54) (cid:96) (cid:54) n , wehave | W ( n, (cid:96), [ a, b ]) | ≥ n − . To prove Theorem 4, we require

Hoeffding’s inequality [22].

Theorem 5 (Hoeffding’s Inequality) . Let Z , Z , . . . , Z n be independent bounded random variables such that a i (cid:54) Z i (cid:54) b i for all i . Let S n = (cid:80) ni =1 Z i . For any t > , we have P ( S n − E [ S n ] ≥ t ) ≤ e − t / (cid:80) ni =1 ( b i − a i ) . Proof of Theorem 4.

Let x be uniformly at random selected element from { , } n . A window w ( i,i + (cid:96) − of length (cid:96) of x is said tobe a forbidden window if the weight of it does not satisfy the constraint, i.e. wt( w ( i,i + (cid:96) − ) / ∈ [ a, b ] . We evaluate the probabilitythat the ﬁrst window w (0 ,(cid:96) ) ( x ) is a forbidden window. Note that [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] . Applying Hoeffding’s inequality we obtain: P (cid:0) wt( w (0 ,(cid:96) ) ) / ∈ [ a, b ] (cid:1) (cid:54) P (cid:0) wt( w (0 ,(cid:96) ) ) / ∈ [ p (cid:96), p (cid:96) ] (cid:1) (cid:54) P (cid:0)(cid:12)(cid:12) wt( w (0 ,(cid:96) ) ) − (cid:96)/ (cid:12)(cid:12) ≥ c(cid:96) (cid:1) = 2 P (cid:0) wt( w (0 ,(cid:96) ) ) − (cid:96)/ ≥ c(cid:96) (cid:1) ≤ e − c (cid:96) (cid:96) = 2 e − c (cid:96) . The function f ( (cid:96) ) = 2 e − c (cid:96) is decreasing in (cid:96) . Since there are ( n − (cid:96) + 1) (cid:54) n windows, applying the union bound, we get P ( x / ∈ W ( n, (cid:96), [ a, b ])) ≤ n e − c (cid:96) (cid:54) ne − e n = 2 /n. Therefore, | W ( n, (cid:96), [ a, b ]) | (cid:62) n (1 − /n ) . For n ≥ , we have that − /n ≥ / . Therefore, W ( n, (cid:96), [ a, b ]) ≥ n − . Note that, since (cid:96) (cid:54) n , we also require n to be largeenough such that n (cid:62) (1 /c ) log e n . (cid:4) Before we present the efﬁcient encoder/decoder for W ( n, (cid:96), [ a, b ]) , the following corollary is crucial to show the correctness ofour algorithms. When m = 1 , by replacing (cid:96) with ( (cid:96) − in Theorem 4, we obtain the following result. Corollary 1.

For (cid:96) (cid:62) , and (cid:96) − (cid:62) (1 /c ) log e ( (cid:96) − , we have (cid:12)(cid:12) W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − (cid:12)(cid:12) ≥ (cid:96) − .B. Sequence Replacement Technique We ﬁrst present an efﬁcient encoder for W ( n, (cid:96), [ a, b ]) when there exist constant numbers p , p , (cid:54) p < / < p (cid:54) that a (cid:54) p (cid:96), b (cid:62) p (cid:96) . For simplicity, we construct an efﬁcient map that translates arbitrary messages into codewords in W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) ⊆ W ( n, (cid:96), [ a, b ]) . A similar class of SWCCs has been introduced in [14], [15]. Formally, such codes impose theweight constraint over every window of size at least (cid:96) , and here we refer such codes as strictly constrained SWCCs . Some lowerbounds on the size of codes are provided for speciﬁc value of p and p (for example, [14] considered p = 1 / and p = 5 / ).Our method is based on the sequence replacement technique . The sequence replacement technique has been widely used in theliterature [23]–[26]. It is an efﬁcient method for removing forbidden windows from a source word. In general, the encoder removesthe forbidden windows and subsequently inserts its representation (which also includes the position of the windows) at predeﬁnedpositions in the sequence. Crucial to the replacement step is to estimate the total number of forbidden windows.In the following of the section, for a binary sequence x , a window of size (cid:96) of x is said to be an (cid:96) -forbidden window if theweight of this window does not belong to [ p (cid:96), p (cid:96) ] . Let F ( (cid:96), [ p (cid:96), p (cid:96) ]) denote the set of all (cid:96) -forbidden windows of size (cid:96) . Thefollowing theorem provides an upper bound on the size of F ( (cid:96), [ p (cid:96), p (cid:96) ]) . Theorem 6.

For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For n (cid:62) and (cid:96) (cid:54) n such that (1 /c ) log e n (cid:54) (cid:96) ,let k = (cid:96) − − log n , there exists an one-to-one map Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k .roof. We ﬁrst show that k > . Since c (cid:54) / , we have (cid:96) (cid:62) (1 /c ) log e n (cid:62) e n > .

77 log n > n for n (cid:62) . For anarbitrary x ∈ { , } (cid:96) , from the proof of Theorem 4, we have P ( x ∈ F ( (cid:96), [ p (cid:96), p (cid:96) ])) ≤ e − c (cid:96) (cid:54) /n . Therefore, the size of F ( (cid:96), [ p (cid:96), p (cid:96) ]) is at most | F ( (cid:96), [ p (cid:96), p (cid:96) ]) | (cid:54) (2 /n )2 (cid:96) = 2 (cid:96) +1 /n . Thus, to represent all forbidden windows in F ( (cid:96), [ p (cid:96), p (cid:96) ]) , we need all binary sequences of length at most k (cid:48) = log 2 (cid:96) +1 /n = (cid:96) + 1 − n (cid:54) (cid:96) − − log n = k for all n (cid:62) . Therefore, there exists a one-to-one map Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k . (cid:4) The key idea in the sequence replacement technique is to ensure that the replacement procedure is guaranteed to terminate. Thegeneral idea is to replace each forbidden window of length (cid:96) (if there is) with a subsequence of length shorter than (cid:96) . Consequently,after each replacement step, the length of codeword is reduced, the replacement procedure is guaranteed to terminate. In ourproblem, in the worst case, the ﬁnal replacement step occurs when the length of the current word is (cid:96) + 1 , since after anotherreplacement (if needed), the length of the current word becomes at most (cid:96) and we cannot proceed further. This ﬁnal step is crucialto ensure that the ﬁnal output codeword satisﬁes the weight constraint. The following result provides an upper bound for the numberof sequences of length (cid:96) + 1 that include at least a forbidden window.Let G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) denote the set of all binary sequences of length ( (cid:96) + 1) that contain at least one forbidden window. Theorem 7.

For (cid:54) p < / < p (cid:54) , let c = min { / − p , p − / } . For (cid:96) (cid:62) and (cid:96) (cid:62) (1 /c ) log e ( (cid:96) + 1) , we have | G ( (cid:96) +1 , [ p (cid:96), p (cid:96) ]) | (cid:54) (cid:96) − . In addition, there exists an one-to-one map Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − .Proof. Since there are only two windows, similar to the proof of Theorem 4 and Theorem 6, by using the union bound, and foran arbitrary x ∈ { , } (cid:96) +1 , we have P ( x ∈ G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ])) ≤ × × e − c (cid:96) (cid:54) / ( (cid:96) + 1) . Therefore, the size of G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) is at most | G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) | (cid:54) (4 / ( (cid:96) + 1) )2 (cid:96) +1 = 2 (cid:96) +3 / ( (cid:96) + 1) . For all (cid:96) (cid:62) , we have (cid:96) +3 / ( (cid:96) + 1) (cid:54) (cid:96) − . According to Corollary 1, | W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − | (cid:62) (cid:96) − , whichimplies that there exists an one-to-one map Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . (cid:4) C. Efﬁcient Encoder/Decoder for W ( n, (cid:96), [ a, b ]) We now present a linear-time algorithm to encode W ( n, (cid:96), [ a, b ]) . For simplicity, we assume log n is an integer. Encoding algorithm.

The algorithm contains three phases: initial phase , replacement phase and extension phase . Particularly, thereplacement phase includes regular replacement and special replacement . Initial phase.

The source sequence x ∈ { , } n − is prepended with , to obtain y = 0 x ∈ { , } n . The encoder scans y and ifthere is no forbidden window, it outputs y . Otherwise, it proceeds to the replacement phase. Replacement phase.

The aim of this procedure is that, at the end of the replacement phase, all forbidden windows of size (cid:96) willbe removed and the length of the current word is at least (cid:96) . If the length of the current word is larger than (cid:96) + 1 , the encoderproceeds to the regular replacement. On the other hand, if the length of the current word y is ( (cid:96) + 1) , the encoder proceeds to thespecial replacement. • Regular replacement.

Let w ( i,(cid:96) ) be the ﬁrst forbidden window in y , for some (cid:54) i (cid:54) n − (cid:96) +1 < n . According to Theorem 6,the total number of forbidden windows of size (cid:96) is at most k , where k = (cid:96) − − log n . Let q be the binary representation oflength log n of i , and q = Φ( w ( i,(cid:96) ) ) of length k . The encoder sets q regular = q q and removes this forbidden window w ( i,(cid:96) ) from y , and then prepends q regular to y . If, after this replacement, y contains no forbidden window, the encoder proceedsto the extension phase. Otherwise, the encoder repeats the replacement phase. Note that such an operation reduces the lengthof the sequence by one, since we remove (cid:96) bits and replace by n + k = 2 + log n + ( (cid:96) − − log n ) = (cid:96) − (bits).Therefore, this procedure is guaranteed to terminate. • Special replacement.

According to Theorem 7, the number of such words is at most (cid:96) − . The encoder sets q special = Ψ( y ) ∈ W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − , i.e. wt( q special ) ∈ [ p ( (cid:96) − , p ( (cid:96) − , and then replaces all ( (cid:96) + 1) bits with q special .After this replacement, the current word is of length (cid:96) and it does not contain any forbidden window. This is because thepreﬁx is 10, which is balanced, the sufﬁx p special satisﬁes wt( q special ) ∈ [ p ( (cid:96) − , p ( (cid:96) − , therefore, according to Lemma1, wt(10 q special ) ∈ [ p (cid:96), p (cid:96) ] . The encoder then proceeds to the extension phase. Extension phase.

If the length of the current sequence y is n where n < n , the encoder appends a sufﬁx of length n = n − n to obtain a sequence of length n . Note that at the end of the replacement phase, the length of the current word is at least (cid:96) . Let z be the last window of size (cid:96) in y . Suppose that z = z z . . . z (cid:96) and wt( z ) ∈ [ p (cid:96), p (cid:96) ] . A simple way to create a sufﬁx is to repeatappending z for sufﬁcient times until the length exceeds n . Let j be the smallest integer such that c = yz j is of length greaterhan n . The encoder outputs the preﬁx of length n of c . We now show that c ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) . Since y does not contain anyforbidden window, it remains to show that there is no forbidden windows in the sufﬁx z j . It is easy to see that repeating the vector z clearly satisﬁes the constraint since every window of size (cid:96) generated in this manner is a cyclic shift of the vector z , and since wt( z ) ∈ [ p (cid:96), p (cid:96) ] , there is no forbidden window.We now present an efﬁcient algorithm to decode the source data uniquely. The decoding procedure is relatively simple as follows. Decoding algorithm.

The decoder scans from left to right. If the ﬁrst bit is 0, the decoder simply removes 0 and identiﬁes the last ( n − bits are source data. On the other hand, if it starts with 11, the decoder takes the preﬁx of length ( (cid:96) − and concludesthat this preﬁx is obtained by a regular replacement. In other words, the preﬁx is of the form q regular , q regular = q q where q is of length log n and q is of length k . The decoder removes this preﬁx, adds the forbidden window w = Φ − ( q ) into position i , which takes q as the binary representation. However, if it starts with 10, the decoder takes the preﬁx of length (cid:96) and concludesthat this preﬁx is obtained by a special replacement. In other words, the preﬁx of length (cid:96) can be represented by q special . Thedecoder replaces the preﬁx of length (cid:96) with the window of length (cid:96) + 1 , w = Ψ − ( q special ) , and then proceeds to decode from w .It terminates when the ﬁrst bit is 0, and the decoder simply takes the following ( n − bits as the source data.We summary the details of our proposed encoder/decoder for W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) as follows. Preparation.

Given (cid:54) p < / < p (cid:54) , c = min { / − p , p − / } , n (cid:62) , (cid:96) (cid:62) , n (cid:62) (cid:96) , where (cid:96) − (cid:62) (1 /c ) log e (cid:96) . Let k = (cid:96) − − log n , we construct two one-to-one maps: Φ : F ( (cid:96), [ p (cid:96), p (cid:96) ]) → { , } k , and Ψ : G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) → W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . In other words, every forbidden window of size (cid:96) in F ( (cid:96), [ p (cid:96), p (cid:96) ]) is represented by a k bits sequence, and every window ofsize (cid:96) + 1 in G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) is represented by a (cid:96) bits sequence in W ( (cid:96) − , (cid:96) − , [ p ( (cid:96) − , p ( (cid:96) − . Encoder W I NPUT : x ∈ { , } n − O UTPUT : c = E NC W ( x ) ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) (I) Initial Phase.

Set y ← x (II) Replacement Phase.

While (there is forbidden window in y ) and (the length of y is greater than (cid:96) + 1 ) Do : • Let i be smallest index such that w ( i,(cid:96) ) is forbidden, (cid:54) i (cid:54) n − (cid:96) + 1 • Let q be the binary representation of length log n of i and let q = Φ( w ( i,(cid:96) ) ) ∈ { , } k • Set q regular = q q • Set y ← y removes w ( i,(cid:96) ) • Set y ← q regular y If (the length of y is ( (cid:96) + 1) ) and (there is a forbidden window in y ) then : • Set q special = Ψ( y ) • Set y ← q special (III) Extension Phase . • Set z be the last window of size (cid:96) in y • Let j be the smallest integer where c = yz j is of length greater than n (IV) Output the preﬁx of length n of c Decoder W .I NPUT : c ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ]) O UTPUT : x = D EC W ( c ) ∈ { , } n − (I) While (the ﬁrst bit is not 0) Do : • If (the ﬁrst two bits are 11) then :(i) Let q q be the preﬁx of length (cid:96) − of c where q is of length log n and q is of length k (ii) c ← c remove the preﬁx(iii) Let i be the index whose binary representation is q (iv) Let w be the forbidden window of size (cid:96) in F ( (cid:96), [ p (cid:96), p (cid:96) ]) , w = Φ − ( q ) (v) Update c by adding w into c at index i • If (the ﬁrst two bits are 10) then :(i) Let q special be the preﬁx of length (cid:96) of c (ii) Let w ∈ G ( (cid:96) + 1 , [ p (cid:96), p (cid:96) ]) such that w = Ψ − ( q special ) (iii) Set c ← w (II) If (the ﬁrst bit is 0) then : Remove 0 • Set c be the preﬁx of length n − (III) Output c Complexity Analysis.

For codewords of length n , it is easy to verify that Encoder W and Decoder W have linear-time complexity.Particularly, in Encoder W, the initial phase takes Θ(1) time. The total number of replacement in the replacement phase is Θ( n ) ,and hence, the running time of replacement phase is Θ( n ) . The extension phase takes Θ( n ) time. Therefore, the running time ofEncoder W is Θ( n ) . Decoder W does the reverse procedure of Encoder W, and therefore, the running time is also Θ( n ) . Eventhough Encoder W offers lower redundancy than Encoder S, it suffers from more severe error propagation, i.e. during the decodingprocedure, a small number of corrupted bits at the channel output might result in error propagation that could corrupt a largenumber of the decoded bits. On the other hand, Encoder S (Decoder S) encodes (decodes) subblocks separately and concatenatesthe outputs, and hence has a limited error propagation. D. Extension to W ( n, (cid:96), a ) , S ( n, (cid:96), [ a, b ]) , and S ( n, (cid:96), a ) Encoder W can be used to construct SWCCs W ( n, (cid:96), a ) for a < L/ with high efﬁciency. Especially, when a (cid:28) L , i.e. thereexist a constant p that a < p L for some p < / , we can set p = 1 and use Encoder W to construct W ( n, (cid:96), [ a, (cid:96) ]) .Since W ( n, (cid:96), [ a, b ]) ⊂ S ( n, (cid:96), [ a, b ]) for n = m(cid:96) , this method also provides an efﬁcient encoder for S ( n, (cid:96), [ a, b ]) with only oneredundant bit. This yields a signiﬁcant improvement in coding redundancy with respect to the Knuth’s balancing technique describedin Section II. Recall that, for codewords of length n = m(cid:96) , the redundancy of Encoder S is Θ( m ) . In contrast, the redundancy ofEncoder W remains one bit for large value of m as long as (cid:96) is sufﬁcient large (refer to the preparation step in Encoder W). Forexample, one may set (cid:96) = Θ(log n ) and m = Θ( n/ log n ) , Encoder W incurs only 1 redundant bit.In addition, recall that S ( n, (cid:96), a ) ≡ S ( n, (cid:96), [ a, b ]) , therefore, for a < (cid:96)/ , Encoder W can also be used to construct SECCs S ( n, L, a ) by setting p = 1 . Similarly, Encoder W can be easily modiﬁed to handle the case a = (cid:96)/ .IV. E RROR -C ORRECTION C ODES

In this section, we combine the previous constructions of Encoder S and Encoder W with error correction constraints. The outputcodewords satisfy the weight constraint and are capable of correcting multiple substitution errors. In this work, we assume thatthe distance between two errors is at least (cid:96) . The intuition behind this assumption is that, since the energy constraint (or weightconstraint) is guaranteed over every subblocks (or windows) of length (cid:96) , the probability of having multiple errors in a subblock orwindow is small. A similar model correcting single deletion or single insertion over subblocks has been studied by Abroshan etal. [28]. In this work, we impose the Hamming distance constraint and the codebooks are capable of correcting substitution errors.We ﬁrst introduce the

Varshamov-Tenengolts (VT) codes deﬁned by Levenshtein [27] to correct a single substitution.

Deﬁnition 7.

The binary VT syndrome of a binary sequence x ∈ { , } n is deﬁned to be Syn( x ) = (cid:80) ni =1 ix i .For a ∈ Z n , let L a ( n ) = { x ∈ { , } n : Syn( x ) = a (mod 2 n ) } . Theorem 8 (Levenshtein [27]) . For a ∈ Z n , the code L a ( n ) can correct a single substitution in linear time. There exists alinear-time decoding algorithm D EC L a : { , } n → L a ( n ) such that the following holds. If c ∈ L a ( n ) and y is the received vectorwith at most one substitution, then D EC L a ( y ) = c . In fact, Levenshtein [27] showed that L a ( n ) can also correct a single deletion or single insertion. A. Construction of SECCs with Error-Correction Capability

In SECC S ( n, (cid:96), [ a, b ]) or S ( n, (cid:96), a ) , each codeword contains m = n/(cid:96) subblocks of length (cid:96) . We simply append the informationof the syndrome of each subblock to the end of each subblock. Note that the redundant part must also satisfy the weight constraint.To do so, we propose a simple method to ensure the redundant part is balanced. The extra redundancy for each subblock is (cid:96) ,and hence, the total redundancy of the encoder is m log 2 (cid:96) . For simplicity, assume that t = log 2 (cid:96) is integer. In the following, wepresent an efﬁcient encoder for S ( n, (cid:96), [ a, b ]) that can correct m substitution errors. For simplicity, we ﬁrst present the case where a (cid:54) p (cid:96), b (cid:62) p (cid:96) for some constant (cid:54) p < / < p (cid:54) (cid:96) . This construction can be easily modiﬁed to handle other classes ofSECCs (for arbitrary parameters a, b or S ( n, (cid:96), a ) where a (cid:54) (cid:96)/ , refer to Subsection II-C). Preparation phase.

Given n = m(cid:96) , (cid:96) = (cid:96) − (cid:96) − r , k = ( p − p ) (cid:96) , r = (cid:100) log( (cid:98) / ( p − p ) (cid:99) + 1) (cid:101) , set S ( k,(cid:96) ) be the set ofindices as deﬁned in Deﬁnition 6. We construct a one-to-one correspondence between the indices in S ( k,(cid:96) ) and the r bits balancedsequences. We require (cid:96) to be large enough so that (cid:96) = (cid:96) − (cid:96) − r > . Note that r = O (log (cid:96) ) . Encoder S

ECC .I NPUT : x ∈ { , } m(cid:96) O UTPUT : c (cid:44) E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) , where a (cid:54) p (cid:96), b (cid:62) p (cid:96) and n = m(cid:96) I) Set (cid:96) = (cid:96) − (cid:96) . Use the Encoder S to obtain y = E NC S ( x ) ∈ S ( m(cid:96) , (cid:96) , [ p (cid:96) , p (cid:96) ]) . In other words, each subblockof length (cid:96) = (cid:96) − (cid:96) − r in x is encoded to a subblock of length (cid:96) = (cid:96) − (cid:96) in y (II) For (cid:54) i (cid:54) m Do: • Set z i = B ( i,(cid:96) ) ( y ) • Compute a = Syn( z i ) (mod 2 (cid:96) ) • Set p be the binary representation of a of length log 2 (cid:96) • Set q be the complement of p , i.e. q = p • Set c i = z i pq of length (cid:96) (III) Output c = c c . . . c m Theorem 9.

The Encoder S

ECC is correct. In other words, E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) and is capable of correcting at most m substitution errors for all x with the assumption that the distance between any two errors is at least (cid:96) .Proof. Let c = E NC ECC S ( x ) . We ﬁrst show that c ∈ S ( n, (cid:96), [ a, b ]) . Since z i = B ( i,(cid:96) ) ( y ) where y ∈ S ( m(cid:96) , (cid:96) , [ p (cid:96) , p (cid:96) ]) , wt( z i ) ∈ [ p (cid:96) , p (cid:96) ] . On the other hand, pq is balanced since q is the complement of p . According to Lemma 1, the i th subblock c i = z i pq satisfy the weight constraint, i.e. wt( c i ) ∈ [ p (cid:96), p (cid:96) ] ⊆ [ a, b ] .It remains to show that each subblock of c can correct a substitution error. To do so, we provide an efﬁcient decoding algorithm.Suppose that we receive a sequence y = y y . . . y m where each subblock y i is of length (cid:96) . For (cid:54) i (cid:54) m , we decode the i thsubblock as follows. Let z i be the sufﬁx of length (cid:96) = (cid:96) − (cid:96) of y i , p be the following log 2 (cid:96) bits, and q be the sufﬁx oflength log 2 (cid:96) . • If q (cid:54) = p , then we conclude that there is an error in the sufﬁx pq , consequently there is no error in z i . The decoder useDecoder S to decode z i . • If q ≡ p then we conclude that there is no error in the sufﬁx pq , consequently there is at most one error in z i . We then useD EC L a ( z i ) to correct z i where a is the integer in Z (cid:96) whose binary representation is p .In conclusion, E NC ECC S ( x ) ∈ S ( n, (cid:96), [ a, b ]) and is capable of correcting at most m substitution errors with the assumption thatthe distance between any two errors is at least (cid:96) for all x ∈ { , } m(cid:96) . (cid:4) For completeness, we describe the corresponding decoder as follows.

Decoder S

ECC .I NPUT : y ∈ { , } m(cid:96) O UTPUT : x (cid:44) D EC ECC S ( y ) ∈ { , } m(cid:96) (I) For (cid:54) i (cid:54) m Do : • Set y i = B ( i,(cid:96) ) ( y ) • Set z i be the preﬁx of length (cid:96) = (cid:96) − (cid:96) of y i , p be the following log 2 (cid:96) bits and q be the sufﬁx of length log 2 (cid:96) ,i.e. y i = z i pq • If ( q ≡ p ) Do :(i) Let a ∈ Z (cid:96) whose binary representation is p (ii) Let c i = D EC L a ( z i ) of length (cid:96) = (cid:96) − (cid:96) (iii) Use Decoder S to obtain x i = D EC S ( c i ) of length (cid:96) • If ( q (cid:54) = p ) Do (i) Let c i ≡ z i (ii) Use Decoder S to obtain x i = D EC S ( c i ) of length (cid:96) (II) Output x = x x . . . x m ∈ { , } m(cid:96) Analysis.

Since Encoder S/Decoder S has linear-time encoding/decoding complexity and the error correction decoder for eachsubblock D EC L a ( z i ) also has linear-time complexity, both Encoder S ECC and Decoder S ECC have linear-time complexity. Theredundancy for error-correction in each subblock is (cid:96) . Consequently, the total redundancy for codewords of length n = m(cid:96) is then m ( r + 2 log 2 (cid:96) ) . Recall that r = Θ(1) . Therefore, this encoding method is efﬁcient when the number of subblocks is smallcompared to the length of codeword, i.e. m = Θ(1) , (cid:96) = Θ( n ) or m = o ( n ) . In such cases, the rate of Encoder S ECC approachthe channel capacity for sufﬁcient (cid:96), n , lim n →∞ m ( (cid:96) − (cid:96) − r ) m(cid:96) = lim (cid:96) →∞ (cid:96) − (cid:96) − r(cid:96) = lim (cid:96) →∞ − log 2 (cid:96) + r(cid:96) = 1 . . Construction of SWCCs with Error-Correction Capability In order to combine Encoder W/Decoder W with error-correction capability, we need to make sure that after appending thesyndrome to the end of the information data, any overlapping window of size (cid:96) between two parts does not violate the weightconstraint. Speciﬁcally, suppose that x = x x . . . x m ∈ W ( n, (cid:96), [ p (cid:96), p (cid:96) ] , where x i is of length (cid:96) , and we append the balancedsufﬁx y i (representing the syndrome of x i ) to the end of x i , any window of size (cid:96) in x i y i and y i x i +1 must not be a forbiddenwindow. The following result is crucial to the method of appending the syndrome in such a way that the weight constraint ispreserved.For constant p , p where (cid:54) p < / < p (cid:54) , let p (cid:48) = 1 / p + 1 / and p = 1 / p + 1 / , and (cid:96) be sufﬁcient that (cid:96) (1 / − p ) (cid:62) (cid:96) + 1 and (cid:96) ( p − / (cid:62) (cid:96) + 1 . Deﬁnition 8.

Given two binary sequences of same length x = x x . . . x n and y = y y . . . y n , the interleaved sequence of x and y is deﬁned by x || y (cid:44) x y x y . . . x n y n .For a binary sequence x ∈ { , } n , recall that x denote the complement of x . Clearly, we get x || x is balanced. Lemma 2.

Given x ∈ { , } (cid:96) such that wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and y ∈ { , } m where m (cid:54) log 2 (cid:96) . Set z = y || y . For (cid:54) i (cid:54) (cid:96) ,let u i be the sufﬁx of length ( (cid:96) − i ) of x and v i be the preﬁx of length i of z . We then have wt( u i v i ) ∈ [ p (cid:96), p (cid:96) ] for (cid:54) i (cid:54) (cid:96) .Proof. For (cid:54) i (cid:54) (cid:96) , we ﬁrst show that wt( u i v i ) (cid:62) p (cid:96) . Since wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and u i is the sufﬁx of length ( (cid:96) − i ) of x , we get wt( u i ) (cid:62) p (cid:48) (cid:96) − i . On the other hand, we observe that wt( v i ) (cid:62) ( i − / . Hence, wt( u i v i ) (cid:62) ( p (cid:48) (cid:96) − i ) + ( i − / p (cid:48) (cid:96) − ( i + 1) /

2= 1 / p + 1 / (cid:96) − ( i + 1) / (cid:62) / / − p ) (cid:96) − ( i + 1) (cid:124) (cid:123)(cid:122) (cid:125) (cid:62) ] + p (cid:96) (cid:62) p (cid:96). Similarly, we have wt( u i ) (cid:54) p (cid:48) (cid:96) , wt( v i ) (cid:54) i , and hence, wt( u ) (cid:54) p (cid:48) (cid:96) + i = 1 / p + 1 / (cid:96) + i (cid:54) / / − p ) (cid:96) + i (cid:124) (cid:123)(cid:122) (cid:125) (cid:54) + p (cid:96) (cid:54) p (cid:96). In conclusion, wt( u i v i ) ∈ [ p (cid:96), p (cid:96) ] for (cid:54) i (cid:54) (cid:96) . (cid:4) Corollary 2.

Given x ∈ { , } (cid:96) such that wt( x ) ∈ [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] , and y ∈ { , } m where m (cid:54) log 2 (cid:96) . Set z = y || y . Let v be anysubstring of length i of z . Let x (cid:48) = x x be a substring of length (cid:96) − i of x and let u = x vx . We then have wt( u ) ∈ [ p (cid:96), p (cid:96) ] .Proof. Similar to the proof of Lemma 2, we can show that wt( u ) (cid:62) ( p (cid:48) (cid:96) − i ) + ( i − / (cid:62) p (cid:96), and wt( u ) (cid:54) p (cid:48) (cid:96) + i (cid:62) p (cid:96). Therefore, wt( u ) ∈ [ p (cid:96), p (cid:96) ] . (cid:4) Corollary 3.

Let x = x x ∈ W (2 (cid:96), (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ] . Let a = Syn( x ) (mod 2 (cid:96) ) and set p be the binary representation of a of length log 2 (cid:96) . Let y = x ( p || p ) x . There is no forbidden window in y , in other words, y ∈ W (2 (cid:96) + 2 log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) .Proof. Consider a window of size (cid:96) of y . We have three following cases. • Case 1. The window includes the sufﬁx of length ( (cid:96) − i ) of x and a preﬁx of length i of p || p where i (cid:54) (cid:96) . Clearly, itis not a forbidden window, according to Lemma 2. • Case 2. The window is of the form uvw where u is the sufﬁx of length i of x , v ≡ p || p and w is the preﬁx of length (cid:96) − i − (cid:96) of x . Clearly, it is not a forbidden window, according to Corollary 2. • Case 3. The window includes the sufﬁx of length i of p || p and a preﬁx of length ( (cid:96) − i ) of x . Similar to case 1, it is not aforbidden window.In conclusion, we have y ∈ W (2 (cid:96) + 2 log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) . (cid:4) In the following, we present efﬁcient encoder/decoder for SWCCs with error-correction capability. For simplicity, we assume that n = m(cid:96) . Recall that p (cid:48) = 1 / p + 1 / and p = 1 / p + 1 / , and (cid:96) (1 / − p ) (cid:62) (cid:96) + 1 and (cid:96) ( p − / (cid:62) (cid:96) + 1 . Encoder W

ECC .I NPUT : x ∈ { , } n − O UTPUT : c (cid:44) E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) I) Use the Encoder W to obtain y = E NC W ( x ) ∈ W ( n, (cid:96), [ p (cid:48) (cid:96) , p (cid:48) (cid:96) ]) . In other words, the Encoder W is constructed based onthe values of p (cid:48) , p (cid:48) . Suppose that y = y y . . . y m where y i ∈ { , } (cid:96) ∩ W ( (cid:96), (cid:96), [ p (cid:48) (cid:96) , p (cid:48) (cid:96) ]) for (cid:54) i (cid:54) m .(II) For (cid:54) i (cid:54) m Do: • Compute a i = Syn( y i ) (mod 2 (cid:96) ) • Set p i be the binary representation of a i of length log 2 (cid:96) • Set q i = p i || p of length (cid:96) and q i is balanced • Set c i = y i q i of length (cid:96) + 2 log 2 (cid:96) (III) Output c = c c . . . c m Theorem 10.

The Encoder W

ECC is correct. In other words, E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , which can correct upto m substitution errors with the assumption that the distance between any two errors is at least (cid:96) for all x ∈ { , } n − . Theredundancy of Encoder W ECC is m log 2 (cid:96) (bits).Proof. Let c = E NC ECC W ( x ) . We ﬁrst show that c ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , in other words, there is no forbidden windowin c . Since y i ∈ W ( (cid:96), (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ]) ⊂ W ( (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) , we only need to show that any window of size (cid:96) in y i p i q i y i +1 is not aforbidden window for (cid:54) i (cid:54) m − . This follows directly from Corollary 3.It remains to show that c can correct m substitution errors. To do so, we provide an efﬁcient decoding algorithm and thisalgorithm is similar to the case of Encoder/Decoder S error as discussed in the earlier section. Suppose that we receive a sequence c (cid:48) = c (cid:48) c (cid:48) . . . c (cid:48) m where each subblock c (cid:48) i is of length (cid:96) + 2 log 2 (cid:96) . For (cid:54) i (cid:54) m , we decode the i th subblock as follows. Let z i be the preﬁx of length (cid:96) of c (cid:48) i and q i be the sufﬁx of length (cid:96) , and q i = p i || p (cid:48) i • If p (cid:48) i (cid:54) = p i , then we conclude that there is an error in q i , consequently there is no error in z i . The decoder use Decoder W todecode z i . • If p (cid:48) i ≡ p i then we conclude that there is no error in the sufﬁx q i , consequently there is at most one error in z i . We then useD EC L a ( z i ) to correct z i where a is the integer in Z (cid:96) whose binary representation is p i .In conclusion, E NC ECC W ( x ) ∈ W ( n + 2 m log 2 (cid:96), (cid:96), [ p (cid:96), p (cid:96) ]) and is capable of correcting at most m substitution errors with theassumption that the distance between any two errors is at least (cid:96) for all x ∈ { , } n − . (cid:4) For completeness, we describe the corresponding decoder as follows.

Decoder W

ECC .I NPUT : y ∈ { , } n +2 m log 2 (cid:96) ,O UTPUT : x (cid:44) D EC ECC W ( y ) ∈ { , } n − (I) For (cid:54) i (cid:54) m Do : • Set y i = B ( i,(cid:96) +2 log 2 (cid:96) ) ( y ) • Set z i be the preﬁx of length (cid:96) of y i , q i be the following (cid:96) bits and q i = p i || p (cid:48) i • If ( p (cid:48) i ≡ p i ) Do :(i) Let a ∈ Z (cid:96) whose binary representation is p i (ii) Let c i = D EC L a ( z i ) of length (cid:96) • If ( p (cid:48) i (cid:54) = p i ) Do (i) Let c i ≡ z i (II) Let c = c c . . . c m ∈ { , } n ∩ W ( n, (cid:96), [ p (cid:48) (cid:96), p (cid:48) (cid:96) ]) (III) Use Decoder W to obtain x = D EC W ( c ) of length n − (IV) Output x Analysis.

Since Encoder W/Decoder W has linear-time encoding/decoding complexity and the error correction decoder for eachsubblock also has linear-time complexity, both Encoder W ECC and Decoder W ECC have linear-time complexity. The total redun-dancy of Encoder W is m log 2 (cid:96) , which is slightly less than the redundancy of Encoder S. This encoding method is efﬁcientwhen the number of subblocks is small compared to the length of codeword, i.e. m = o ( n ) . In such cases, the rate of Encoder W ECC approaches the channel capacity for sufﬁcient large (cid:96), n , lim n →∞ n − n + 2 m log 2 (cid:96) = 1 . V. C

ONCLUSION

We have presented novel and efﬁcient encoders that translate source binary data to codewords in SECCs, SWCCs, boundedSECCs, and bounded SWCCs. Our coding methods, based on Knuth’s balancing technique and sequence replacement technique,incur low redundancy and have linear-time complexity. For certain codes parameters, our methods incur only one redundant bit.We also imposed minimum distance constraint to the designed codewords for error correction capability.

EFERENCES[1] K. A. S. Immink, “Runlength-limited sequences,”

Proc. IEEE , vol. 78, no. 11, pp. 1745-1759, Nov. 1990.[2] K. A. S. Immink,

Codes for Mass Data Storage Systems , Second Edition, ISBN 90-74249-27-2, Shannon Foundation Publishers, Eindhoven, Netherlands,2004.[3] L. R. Varshney, “Transporting information and energy simultaneously,”

Proc. 2008 IEEE Int. Symp. Inf. Theory , Jul. 2008, pp. 1612-1616.[4] A. Tandon, M. Motani, and L. R. Varshney, “Subblock-constrained codes for real-time simultaneously energy and information transfer,”

IEEE TransactionInformation Theory , vol. 62, no. 7, pp. 4212-4227, Jul. 2016.[5] T. Y. Wu, A. Tandon, L. R. Varshney, and M. Motani, “Skip-Sliding Window Codes”,

Proc. 2018 IEEE Int. Symp. Inf. Theory .[6] S. Zhao, “A serial concatenation-based coding scheme for dimmable visible light communication systems,”

IEEE Commun. Lett. , vol. 20, no. 10, pp. 1951-1954,Oct. 2016.[7] Y. M. Chee, Z. Cherif, J. L. Danger, S. Guilley, H. M. Kiah, J. L. Kim, P. Sole, and X. Zhang, “Multiply constant-weight codes and the reliability of loopphysically unclonable functions,”

IEEE Transaction Information Theory , vol. 60, no. 11, pp. 7026-7034, No. 2014.[8] A. Tandon, H. M. Kiah, and M. Motani, “Binary subblock energy-constrained codes: bounds on code size and asymptotic rate”,

Proc. 2017 IEEE Int. Symp.Inf. Theory , 2017.[9] H. M. Kiah, A. Tandon, and M. Motani. “Generalized Sphere-Packing Bound for Subblock-Constrained Codes”,

Proc. 2019 IEEE Int. Symp. Inf. Theory , Jul.2019.[10] A. Tandon, H. M. Kiah , and M. Motani , “Bounds on the Size and Asymptotic Rate of Subblock-Constrained Codes”, IEEE Transactions on InformationTheory, Vol. 64, No. 10, October 2018.[11] ´A. I. Barbero, E. Rosnes, G. Yang, and Ø. Ytrehus, “Constrained codes for passive RFID communication,” in

Proc. 2011 Inf. Theory Appl. Workshop , Feb.2011.[12] A. M. Fouladgar, O. Simeone, and E. Erkip, “Constrained codes for joint energy and information transfer,”

IEEE Trans. Commun. , vol. 62, no. 6, pp. 2121-2131,Jun. 2014.[13] K. A. S. Immink, and K. Cai, “Properties and constructions of energy-harvesting sliding-window constrained codes,”

IEEE Communications Letters , May2020.[14] R. Gabrys, E. Yaakobii, and O. Milenkovic, “Codes in the Damerau Distance for Deletion and Adjacent Transposition Correction”,

IEEE Trans. Inform.Theory , vol. 64, no. 4, Apr. 2018.[15] K. Cai, Y. M. Chee, R. Gabrys, H. M. Kiah, and T. T. Nguyen, “Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage”, arXiv,arXiv:1910.06501 , Oct. 2019.[16] R. Gabrys, H. M. Kiah, A. Vardy, E. Yaakobi, and Y. Zhang, “Locally Balanced Constraints”,

Proc. IEEE Int. Symp. Inf. Theory (ISIT 2020) , pp. 664-669,Jun. 2020.[17] D. E. Knuth, “Efﬁcient Balanced Codes”,

IEEE Trans. Inform. Theory , vol. IT-32, no. 1, pp. 51-53, Jan 1986.[18] N. Alon, E. E. Bergmann, D. Coppersmith, and A. M. Odlyzko, “Balancing sets of vectors”,

IEEE Trans. Inf. Theory , vol. IT-34, no. 1, pp. 128-130, Jan.1988.[19] V. Skachek and K. A. S. Immink, “Constant Weight Codes: An Approach Based on Knuth’s Balancing Method”,

IEEE Journal on Selected Areas inCommunications , vol. 32, No. 5, May 2014.[20] L. G. Tallini, R. M. Capocelli, and B. Bose, “Design of some new balanced codes,”

IEEE Trans. Inf. Theory , vol. IT-42, pp. 790-802, May 1996.[21] K. A. S. Immink and K. Cai, “Properties and Constructions of Constrained Codes for DNA-Based Data Storage,”

IEEE Access , vol. 8, pp. 49523- 49531,Mar. 2020.[22] W. Hoeffding, “Probability inequalities for sums of bounded random variables”,

Journal of the American Statistical Association , vol. 58, no. 301, pp. 13–30.[23] A. J. de Lind van Wijngaarden and K. A. S. Immink, “Construction of Maximum Run-Length Limited Codes Using Sequence Replacement Techniques,”

IEEE Journal on Selected Areas of Communications , vol. 28, pp. 200-207, 2010.[24] O. Elishco, R. Gabrys, M. Medard, and E. Yaakobi, “Repeated-Free Codes”,

Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Paris, France, 2019.[25] C. Schoeny, A. Wachter-Zeh, R. Gabrys, and E. Yaakobi, “Codes correcting a burst of deletions or insertions,”

IEEE Trans. Inform. Theory , vol. 63, no. 4,pp. 1971-1985, 2017.[26] K. A. S. Immink, and K. Cai, “Design of Capacity-Approaching Constrained Codes for DNA-Based Data Storage Systems,”

IEEE Communications Letters ,vol. 22, no. 2, pp. 224-227, 2018.[27] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions and reversals”,

Doklady Akademii Nauk SSSR , vol. 163, no. 4, pp. 845-848, 1965.[28] M. Abroshan, R. Venkataramanan, and A. G. i Fabregas, “Coding for segmented edit channels”,

IEEE Trans. Inf. Theory , vol. 64, pp. 3086-3098, 2017.[29] T. T. Nguyen, K. Cai, and K. A. S. Immink, “Binary Subblock Energy-Constrained Codes: Knuth’s Balancing and Sequence Replacement Techniques”, toappear,