[PDF] An Algorithm for Constructing a Smallest Register with Non-Linear Update Generating a Given Binary Sequence

Abstract

Registers with Non-Linear Update (RNLUs) are a generalization of Non-Linear Feedback Shift Registers (NLFSRs) in which both, feedback and feedforward, connections are allowed and no chain connection between the stages is required. In this paper, a new algorithm for constructing RNLUs generating a given binary sequence is presented. Expected size of RNLUs constructed by the presented algorithm is proved to be O(n/log(n/p)), where n is the sequence length and p is the degree of parallelization. This is asymptotically smaller than the expected size of RNLUs constructed by previous algorithms and the expected size of LFSRs and NLFSRs generating the same sequence. The presented algorithm can potentially be useful for many applications, including testing, wireless communications, and cryptography.

Full PDF

aa r X i v : . [ c s . I T ] J un An Algorithm for Constructing a Smallest Registerwith Non-Linear Update Generating a Given BinarySequence

Nan Li,

Student Member, IEEE and Elena Dubrova,

Member, IEEE

Abstract —Registers with Non-Linear Update (RNLUs) are a generalization of Non-Linear Feedback Shift Registers (NLFSRs) in which both,feedback and feedforward, connections are allowed and no chain connection between the stages is required. In this paper, a new algorithm forconstructing RNLUs generating a given binary sequence is presented. Expected size of RNLUs constructed by the presented algorithm is proved tobe O ( n / log ( n / p )) , where n is the sequence length and p is the degree of parallelization. This is asymptotically smaller than the expected size ofRNLUs constructed by previous algorithms and the expected size of LFSRs and NLFSRs generating the same sequence. The presented algorithmcan potentially be useful for many applications, including testing, wireless communications, and cryptography. Index Terms —Binary sequence, LFSR, NLFSR, binary machine, circuit-size complexity, BIST. ✦ NTRODUCTION

Binary sequences are important for many areas, includingcryptography, wireless communications, and testing.In cryptography, pseudo-random binary sequences are usedin stream cipher-based encryption . A stream cipher producesa keystream by combining a pseudo-random sequence witha message, usually by the bit-wise addition [1]. The securityof stream ciphers is directly related to statistical propertiesof pseudo-random sequences. At present, there is no securemethod for generating pseudo-random sequences which satisfythe extreme limitations of technologies like RFID. Low-costRFID tags cannot dedicate more than a few hundreds ofgates for security functionality [2]. Even the most compactof today’s encryption systems contain over 1000 gates [3].The lack of adequate protection mechanisms gives rise tomany security problems and blocks off a variety of potentialapplications of RFID technology.In wireless communications, pseudo-random sequences areused for scrambling and spreading of the transmitted signal.Scrambling is performed to give a transmitted signal someuseful engineering properties, e.g. to reduce the probabilityof interference with adjacent channels or to simplify timingrecovery at the receiver [4]. Spreading increases a bandwidthof the original signal making possible to maintain, or evenincrease, communication performance when signal power isbelow the noise ﬂoor [5]. For both, scrambling and spreading,it is important to select pseudo-random sequences carefully,because their length, bit rate, correlation and other propertiesdetermine the capabilities of the resulting systems. Today’swireless communication systems typically use Linear Feed-back Shift Register (LFSR) sequences, or sequences obtainedby linearly combining pairs of LFSR sequences, such as Goldcodes [6]. There are many theoretical results demonstrating the

The authors are with the Royal Institute of Technology (KTH), Stockholm,Sweden. advantages of using nonlinear sequences in wireless communi-cations. For example, complementary sequences can solve thenotorious problem of power control in Orthogonal FrequencyDivision Multiplexing (OFDM) systems by maintaining atightly bounded peak-to-mean power ratio [7]. Popovich [8]has shown that multi-carrier spread spectrum systems usingcomplementary and extended Legendre sequences outperformthe best corresponding multi-carrier Code Division MultipleAccess (CDMA) system using Gold codes. However, due tothe lack of efﬁcient hardware methods for generating nonlinearsequences, their theoretical advantages cannot be utilized atpresent.

Built-In-Self-Test (BIST) uses the pseudo-random binaryvectors usually generated on-chip by an LFSR as test pat-terns [9]. The hardware cost of an LFSR-based BIST is low.However, the test time of BIST may be long due to random-pattern resistant faults. Several methods for coping with thesefaults have been proposed, including modiﬁcation of the circuitunder test [10], insertion of control and observe points intothe circuit [11], modiﬁcation of the LFSR to generate asequence with a different distribution of 0s and 1s [12], andgeneration of top-off test patterns for random-pattern resistantfaults using some deterministic algorithm and storing themin a Read-Only Memory (ROM) [13]. The latter approachcan help detecting not only random-pattern resistant faults,but also delay faults which are not handled efﬁciently bythe pseudo-random patterns. However, the memory requiredto store the top-off patterns in BIST can exceed 30% of thememory used in a conventional ATPG approach [14]. Findingalternative ways of generating top-off patterns is an importantopen problem.Any binary sequence can be generated using a Register withNon-Linear Update (RNLU) shown in Figure 1a. A k -stageRNLU consists of k binary stages, k updating functions, anda clock. At each clock cycle, the current values of all stagesare synchronously updated to the next values computed by theupdating functions. RNLUs can be viewed as a more general type of Non-Linear Feedback Shift Registers (NLFSRs) (seeFigure 1b) in which both, feedback and feedforward, connec-tions are allowed and no chain connection between the stagesis required.RNLUs are typically smaller and faster than NLFSRs gen-erating the same sequence. For example, consider the 4-stageNLFSR with the updating function f ( x , x , x , x ) = x ⊕ x ⊕ x · x ⊕ x · x , where “ ⊕ ” is the Boolean exclusive-OR, “ · ” is the BooleanAND, and x i is the variable representing the value of thestage i , i ∈ { , , , } . If this NLFSR is initialized to the state ( x x x x ) = ( ) , it generates the output sequence ( , , , , , , , , , , , , , , ) (1)with the period 15. The same sequence can be generated bythe 4-stage RNLU with the updating functions f ( x , x ) = x ⊕ x f ( x , x , x ) = x ⊕ x · x f ( x ) = x f ( x ) = x . We can see that the RNLU uses 3 binary operations, while theNLFSR uses 5 binary operations.While RNLUs can potentially be smaller than NLFSRs,the search space for ﬁnding a smallest RNLU for a givensequence is considerably larger than the corresponding onefor NLFSRs. Algorithms for constructing RNLUs with theminimum number of stages were presented in [15], [16].However, since, for large k , the size of a circuit implementinga k -input Boolean function is typically much larger than thesize of a single stage of a register, usually these algorithms donot minimize the total size of an RNLU.In this paper, we present an algorithm which minimizesthe size of the support set of updating functions, i.e. thenumber of variables on which the updating functions depend.For most Boolean functions, the size of a circuit computing afunction grows exponentially with the number of the variablesin their support set [17]. Therefore, by reducing the numberof variables of updating functions to the minimum, we canminimize the total size of an RNLU. To support this claim,we derive expressions for the expected size of RNLUs con-structed by the presented method and previous approaches.Our analysis shows that RNLUs constructed by the presentedmethod are asymptotically smaller. For completeness, wealso compare RNLUs to linear and nonlinear feedback shiftregisters generating the same sequence.The rest of this paper is organized as follows. Section 2lists the notation and basic concepts used in the paper. Sec-tion 3 discusses the related work. Section 4 gives a generalintroduction to the presented approach. Section 5 describesthe algorithm for constructing RNLU. Section 6 comparesRNLUs constructed by the presented method to the RNLUsconstructed using previous approaches, as well as to linearand nonlinear feedback shift registers. Section 7 presents theexperimental results. Section 8 concludes the paper. ...... f k-1 k-1 k-2 0f k-2 f (a) An RNLU with the degree of parallelization one. ... k-1 k-2 0f (b) An NLFSR with the degree of parallelization one. Fig. 1: General structure of RNLUs and NLFSRs.

RELIMINARIES

In this section, we present basic deﬁnitions and notation usedin the paper. A k -variable Boolean function is a mapping of type f : B k → B , where B = { , } . The support set of a Boolean function f ( x , x , · · · , x k − ) , sup ( f ) , is a set of variables on which f depends: sup ( f ) = { x i | f | x i = = f | x i = } , where f | x i = j = f ( x , · · · , x i − , j , x i + , · · · , x k − ) , for j ∈ { , } .A k -variable Boolean function f can be computed by a logic circuit with k inputs and one output, such that, for everyinput combination a ∈ B k , the circuit output is f ( a ) . The size of a circuit is the number of gates required to implement it.Typically gates are restricted to a certain set, e.g. { AND, OR,NOT } [18]. A k -stage Register with Non-Linear Update (RNLU) (alsocalled binary machine [15], [19]) consists of k binary storageelements, called stages , each capable of storing one bit ofinformation. Every stage i ∈ { , , · · · , n − } has an associated state variable x i ∈ { , } which represents the current valueof the stage i and a Boolean updating function f i : { , } k →{ , } which determines how the value of x i is updated to itsnext value, x + i : x + i = f i ( x , x , · · · , x k − ) . A state of an RNLU is a vector of values of its statevariables. At every clock cycle, the next state of an RNLUis computed from its the current state by updating the valuesof all stages simultaneously to the values of the correspondingupdating functions.The degree of parallelization p of a k -stage RNLU is thenumber of stages used for producing the output at each clockcycle, 1 ≤ p ≤ k . Throughout the paper, we assume that p rightmost stages of RNLU are used for producing its output. A k -stage Feedback Shift Register (FSR) can be viewed as aspecial case of a k -stage RNLU satisfying x + = x x + = x · · · x + k − = x k − x + k − = f ( x , x , · · · , x k − ) The updating function of the stage k − feedbackfunction of the FSR.If all feedback functions of an FSR are linear, then the FSRis called a Linear Feedback Shift Register (LFSR). Otherwise,it is called a

Non-Linear Feedback Shift Register (NLFSR).Its is known that the recurrence relation generated by thefeedback function of a k -stage LFSR has a characteristicpolynomial of degree k [19]. If this polynomial is primitive ,then the LFSR follows a periodic sequence of 2 k − k -bit vectors [19].This result is very important, because it makes possible thegeneration of pseudo-random sequences of length 2 k − O ( k ) . No analogous results has been foundfor the nonlinear case yet. REVIOUS W ORK

There are many different ways of generating binary sequences.A thorough treatment of this topic is given by Knuth in [21].In this section, we focus on FSR-based binary sequencegenerators and their generalizations.LFSRs are one of the most popular devices for generatingpseudo-random binary sequences. They have numerous ap-plications, including error-detection and correction [22], datacompression [23], testing [24], and cryptography [25].The Berlekamp-Massey algorithm can be used to construct asmallest LFSR generating a given binary sequence. It was orig-inally invented by Berlekamp for decoding Bose-Chaudhuri-Hocquenghem (BCH) codes [26]. Massey [27] linked theBerlekamp’s algorithm to LFSR synthesis and simpliﬁed it.There were many subsequent extensions and improvementsof the algorithm, for example Mandelbaum [28] developedits arithmetic analog, Imamura and Yoshida [29] presented analternate and easier derivation, Fitzpatrick [30] found a versionwhich is more symmetrical in its treatment of the iterated pairsof polynomials, and Fleischmann [31] modiﬁed it to extendthe model sequence in both directions around any given databit. It has also been shown that similar to the Berlekamp-Massey algorithm results can be obtained with the Euclideanalgorithm [32] and continued fractions [33].The Berlekamp-Massey algorithm constructs traditional LF-SRs, which generate one output bit per clock cycle. A numberof techniques have been developed for constructing LFSRswith the degree of parallelization p . Two main approachesare: (1) synthesis of subsequences representing p decimationof some phase shift of the original LFSR sequence [34] and(2) computation of the set of states reachable from any state

1. An irreducible polynomial of degree k is called primitive if the smallest m for which it divides x m + k − in p steps. The latter is usually done by computing p th powerof the connection matrix of the LFSR [25]. LFSRs with a highdegree of parallelization are used in applications where highdata rate is important, such a Cyclic Redundancy Check (CRC)widely used in data transmission and storage for detectingburst errors [22].NLFSRs have been much less studied compared to LF-SRs [35]. The ﬁrst algorithm for constructing a smallestNLFSR generating a given binary sequence was presented byJansen in 1991 [36], [37]. Alternative algorithms were givenby Linardatos et al [38], Rizomiliotis et al [39], and Limniotiset al [40].Similarly to the LFSR case, an NLFSR can be re-designedto generate p bits of the sequence per clock cycle. Thisis usually done by duplicating the updating functions of anNLFSR p times, as in [41]–[43]. Such a technique requiresthat the p left-most stages of the NLFSR are not used as inputsto feedback functions or output functions. More generally,the problem of constructing an NLFSR with the degree ofparallelization p can be solved by computing the p th powerof the transition relation induced by its feedback functions.However, the size of circuits computing the p th power of thetransition relation may grow substantially larger than a factorof p [44].An FSR may need up to n stages to generate a binarysequence of length n . For example, the smallest LFSR andNLFSR generating the binary sequence00 · · · | {z } n − , have n and n − n / n [45] and an NLFSR needs 2 log n stagesto generate such a sequence [36]. Note that these boundsreﬂect the size of stages only; they do not take into account thesize of circuits computing feedback functions. Since nonlinearfeedback function of an NLFSR is typically larger than thelinear feedback function of an LFSR, a k -stage NLFSR maybe considerably larger than a k -stage LFSR.The ﬁrst algorithm for constructing an RNLU with theminimum number of stages for a given binary sequence waspresented in [15]. This algorithm exploits the unique propertyof RNLUs that any binary n -tuple can be the next state ofa given current state. The algorithm assigns every 0 of asequence a unique even integer and every 1 of a sequencea unique odd integer. Integers are assigned in an increasingorder starting from 0. For example, if an 8-bit sequence A = ( , , , , , , , ) is given, the sequence of integers(0,2,1,4,3,5,6,7) can be used. This sequence of integers isinterpreted as a sequence of states of an RNLU. The largestinteger in the sequence of states determines the number ofstages. In the example above, ⌈ log ⌉ =

3, thus the resultingRNLU has 3 stages.In [16], the algorithm [15] was extended to RNLUs gener-ating p bits of the output sequence per clock cycle. The mainidea is to encode a binary sequence into an 2 p -ary sequencewhich can be generated by a smaller RNLU. As an example,suppose that we use the 4-ary encoding ( ) = , ( ) = ExtraBits SequenceBitsUpdating FunctionsOutput

Fig. 2: Structure of RNLUs constructed by the presentedalgorithm.1 , ( ) = , ( ) = A from theexample above, into the quaternary sequence (0,2,3,1). Then,we can construct an RNLU generating the sequence A ⌈ log ⌉ =

2, so the resulting RNLU has one stage less thanthe RNLU generating one bit per clock cycle in the previousexample.RNLUs have been successfully applied to the storage ofcryptographic keys [46] and deterministic test patterns [47].For example, it was shown in [46] that an RNLU may takeless than a quarter of the size of a read-only memory storingthe same sequence.

NTUITIVE I DEA

We can separate each state of a k -stage RNLU with the degreeof parallelization p into two parts: p output bits which containthe output sequence and k − p extra bits which are usedfor differentiating the states whose output bits are the same.Output bits are deﬁned by the sequence to be generated. Forthe extra bits, we can use any k − p bit vector that is not usedin another state with the same output bits.As we mentioned previously, the overall size of an RNLUis typically dominated by the size of circuits computing itsupdating functions. The size of these circuits greatly dependson the support sets of updating functions. In order to minimizethe support sets, we use extra bit vectors which are unique forevery speciﬁed state. In other words, not only the states withthe same output bits, but also all other speciﬁed states areassigned a unique ( k − p ) -bit extra bit vector. Such a stateencoding allows us to reduce the support sets of updatingfunctions to variables representing extra bits only, as shownin Figure 2.Suppose we would like to construct an RNLU generatinga binary sequence A of length m × p with the degree ofparallelization p . In order to distinguish between identical p -bit vectors in A , we need at least ⌈ log m ⌉ extra bits. Therefore,the number of stages in the resulting RNLU is given by: k = ⌈ log m ⌉ + p . This number is typically greater than the minimum possiblenumber of stages in an RNLU which can generate A . Theminimum number of stages is determined by partitioning A into p -bit vectors, computing the decimal representation for each p -bit vector, and counting the largest number ofoccurrences among all p -bit vectors with the same decimalrepresentation, N max . For example, in the 10-bit sequence A = ( , , , , , , , , , ) the 2-bit vector (0,1) occurs 3times, so N max =

3. The minimum number of stages in anRNLU generating A is given by [16]: k min = ⌈ log N max ⌉ + p . (2)The presented method reduces the support sets of theupdating functions to the minimum. Updating functions ofoutput bits cannot depend on less than ⌈ log m ⌉ variables sinceotherwise the RNLU would not be able to generate all ⌈ n / p ⌉ p -bit vectors constituting a partitioning of A .Note that the size of an RNLU can be further reduced byremoving the stages representing output bits and taking theoutput directly from the updating functions. LGORITHM

In this section, we present an algorithm for constructing RN-LUs which minimizes the support sets of updating functionsto ⌈ log m ⌉ variables representing extra bits.The pseudocode of the algorithm ConstructRNLU ( A , p ) isshown as Algorithm 1. The input is a binary sequence A =( a , a , · · · , a n − ) and the desired degree of parallelization p .The output is the deﬁning tables of p + r updating functionsof the RNLU generating A with the degree of parallelization p , where r = ⌈ log m ⌉ and m = ⌈ n / p ⌉ .The algorithm begins by selecting an r -stage extra bitsgenerator G using the procedure ChooseGenerator ( n , r ). Aswe mentioned in the previous section, the size of an RNLUdepends on the order of extra bit vectors used for stateencoding. In principle, any permutation of r -bit vectors canbe used, however, a good choice of the generator reduces thesize of the resulting RNLU. For example, if we use an r -stageLFSR or a binary counter as generators of extra bit vectors,then the updating functions of extra bits can be computed bya circuit of size O ( r ) .The selected generator G is set to some initial state g ∈ B r .For LFSRs, g must be a non-zero state. For binary counters, g can be any state. Then, the deﬁning table of updatingfunctions of output bits is constructed as follows. At every step i , i ∈ { , , · · · , m − } , the input part of the table is assignedto be the current state of the generator G , g i , and the outputpart of the table is assigned to be the i th p -bit vector of theinput sequence A .All remaining 2 r − m input assignments are mapped todon’t-care values. This gives us a possibility to specify thefunctions f , f , · · · , f p − so that the size of their circuits isminimized.Since, by construction, the values of functions f , f , · · · , f p − at step i correspond to the i th p -tuple of A , for i ∈ { , , · · · , m − } , the resulting RNLU generates A with the degree of parallelization p .As an example, let us construct an RNLU which generatesthe following 40-bit binary sequence with the degree ofparallelization 4: Algorithm 1

ConstructRNLU ( A , p ) Constructs an RNLUgenerating a binary sequence A = ( a , a , · · · , a n − ) with thedegree of parallelization p . m = ⌈ n / p ⌉ ; r = ⌈ log m ⌉ ; G = ChooseGenerator ( m , r ); Initialize G to an initial state g ∈ B r ; for every i from 0 to m − do for every j from 0 to p − do f j ( g i ) = a i ∗ p + j ; end for g i + = ComputeNextState ( G , g i ); end for for every i from 0 to r − do f p + i = updating function of the stage i of G ; end for Return f , f , · · · , f p + r − ; A = ( , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ) We need r = ⌈ log ⌉ = A a unique extra bit vector. Supposethat we use the 4-stage LFSR with the primitive generatorpolynomial g ( x ) = + x + x for generating extra bits. If wechoose (0001) as the initial state of the LFSR, then extra bitvectors are assigned according to the following sequence ofLFSR states: ( , , , , , , , , , ) . This gives us the following deﬁning table for the updatingfunctions of output bits: x x x x f f f f f ( x , x , x , x ) = x ( x + x )( x + x + x ) f ( x , x , x , x ) = ( x + ( x ⊕ x ))( x + x + x + x ) f ( x , x , x , x ) = ( x + x )( x + x )( x x + x x ) f ( x , x , x , x ) = x x + ( x ⊕ x x ) where “ + ” is the Boolean OR and x denotes the Booleancomplement of x . Algorithm 2

ChooseGenerator ( m , r ) Chooses an r -stagegenerator of extra bits with at least m states. if m < r then G = Any r -stage LFSR with a primitive generatorpolynomial of degree r ; else G = r -stage binary counter; end if Return G ; F out Fig. 3: 8-stage RNLU constructed for the example.The updating functions of extra bits, f , f , f , f are deﬁnedby the LFSR: f ( x , x ) = x ⊕ x f ( x ) = x f ( x ) = x f ( x ) = x Figure 3 shows the structure of the resulting RNLU. Theblock labeled by F out computes the updating functions ofoutput bits f , f , f , f . XPECTED S IZE A NALYSIS

In this section, we derive expressions for the expected sizeof RNLUs constructed using the presented algorithm and thealgorithms [15] and [16]. For completeness, we also showresults for LFSRs and NLFSRs generating the same sequence.In 1942, Shannon [17] has proved that there is an (asymp-totically) large fraction of Boolean functions of k variables thatremains uncomputable with circuits of size larger than 2 k / k .In 1962, Lupanov [48] has shown that, if we allow circuit sizeto be larger by a small fraction of 2 k / k , namely [ + o ( )] k / k ,then we can compute all k -variable Boolean functions. In bothcases, it is assumed that circuits are composed from AND, ORand NOT gates with at most two inputs.From these two bounds, we can conclude that “most”Boolean function of k variables require a circuit of size a k / k to be computed, where a is a constant such that 1 ≤ a ≤ b gates. Since the analysis is asymptotic, without the lossof precision we use log n instead of ⌈ log n ⌉ . Let A be a binary sequence of length n in which every elementis selected independently and uniformly at random from B . Throughout this section, we call such a sequence a randomsequence . Suppose that Algorithm 1 is used to construct anRNLU generating A with the degree of parallelization one.Then, the resulting RNLU has: • one stage for the output bit, • log n stages for extra bits, • log n updating functions of the extra bits, • one updating function of the output bit.The updating functions of the extra bits can be computedby a circuit of size O ( log n ) . The updating function f of theoutput bit is expected to depend on all log n state variablesof extra bits. This is because the probability that f | x i = = f | x i = for some i ∈ { , , · · · , ( log n ) − } goes to 0 as thesequences length increases. Therefore, f requires a circuit ofsize a n / log n to be computed. So, the expected size of theRNLU constructed by the presented algorithm is E [ RNLU ( n , )] = b ( + log n ) + a n / log n + O ( log n )= O ( n / log n ) . (3)Next, suppose that the algorithm [15] is used to constructan RNLU for the same sequence. This algorithm constructs anRNLU with the minimum number of stages k min given by (2).For sufﬁciently large random sequences, this number can beapproximated as: k min ≈ + log ( n / ) = log ns . In this case, the resulting RNLU has k min stages and k min updating functions with the support set of size k min . Thesefunctions required k min circuits of size a k min / k min to becomputed, so their expected size is given by: k min · a k min / k min = a log n = a n . Therefore, the expected size of the RNLU constructed by thealgorithm [15] is: E [ RNLU ( n , )] = a n + b log n = O ( n ) . (4)Next, suppose that Berlekamp-Massey algorithm [27] isused to construct an LFSR for the same sequence. Supposethat this LFSR has l stages. According to [45], for sufﬁcientlylarge random sequences, l ≈ n /

2. The linear feedback functionof the LFSR can be computed by a circuit of size O ( n ) . So,the expected size of the LFSR is E [ LFSR ( n , )] = b n / + O ( n ) = O ( n ) . (5)Finally, suppose an r -stage NLFSR is constructed of thesame sequence, e.g. using the algorithm [38]. Accordingto [36], for sufﬁciently large random sequences, r ≈ n .Thus, the feedback function of the NLFSR has the support setof size 2 log n . It requires a circuit of size a · n / ( n ) to be computed. Therefore, the expected size of the NLFSRis E [ NLFSR ( n , )] = b log n + a · n / ( n )= b log n + a n / ( n )= O ( n / log n ) . (6) As we can see from equations (3), (4), (5), and (6), forsufﬁciently large random sequences, RNLUs with the degreeof parallelization one constructed by the presented algorithmare asymptotically smaller than RNLUs constructed by thealgorithm [15], LFSRs, and NLFSRs. p In this section, we extend the analysis to the degree ofparallelization p .Let A be a random binary sequence of length n . Supposethat Algorithm 1 is used to construct an RNLU generating A with the degree of parallelization p . Let m = ⌈ n / p ⌉ . Then thisRNLU has: • p stages for the output bits, • log m stages for extra bits, • log m updating functions of the extra bits, • p updating functions of the output bits.The updating functions of the extra bits can be computedby a circuit of size O ( log m ) . Each of the p updatingfunctions of the output bits is expected to depend on alllog m state variables of extra bits. This is because, for any j ∈ { , , · · · , p − } , the probability that f j | x i = = f j | x i = forsome i ∈ { p , p + , · · · , ( p + log m ) − } goes to 0 as thesequences length increases. Therefore, the updating functionsof output bits require p circuits of size a m / log m to becomputed. Thus, the expected size of the RNLU constructedby the presented algorithm is E [ RNLU ( n , p )] = b ( p + log m ) + p a m / log m + O ( log m )= O ( n / log m )= O ( n / log ( n / p )) . (7)Suppose that the algorithm [16] is used to construct anRNLU for the same sequence. The number of stages k min isgiven by (2). Since 1 ≤ N max ≤ m , we get p ≤ k min ≤ p + log m . The lower bound is reached when each p -bit vector occurs in A exactly once. This is possible only if n ≤ p . Thereforelog n ≤ k min ≤ p + log m . (8)The k min updating functions require k min circuits of size a k min / k min to be computed, so their expected size is a k min .From (8), we get: a n ≤ a k min ≤ a m p Therefore, the lower bound on expected size of the RNLUconstructed by the algorithm [16] is: E [ RNLU ( n , p )] ≥ b log n + a n ≥ O ( n ) . (9)An LFSR with the degree of parallelization p has thesame number of stages as the LFSR with the degree ofparallelization one, but its feedback function is modiﬁed tocompute p th power of the connection matrix. This impliesthat the expected size of the circuit computing the feedback function of the LFSR increases p times. So, the expected sizeof the LFSR is E [ LFSR ( n , p )] = b n / + O ( pn ) = O ( pn ) . (10)Similarly, NLFSRs with the degree of parallelization p areconstructed by modifying its feedback functions to compute p th power of its transition relation. This may increase inthe size of the circuit computing p th power of its transitionrelation more than p times due to multiplication of non-linearterms [44]. The the expected size of the NLFSR is thus E [ NLFSR ( n , p )] ≥ b log n + a · p · n / ( n ) ≥ b log n + a pn / ( n ) ≥ O ( pn / log n ) . (11)From equations (7), (9), (10), and (11), we can concludethat, for sufﬁciently large random sequences, RNLUs withthe degree of parallelization p constructed by the presentedalgorithm are asymptotically smaller than RNLUs constructedby the algorithm [15], LFSRs, and NLFSRs.Note that our analysis does not take into account that twocircuits implementing two k -variable Boolean functions mayshare some gates, and therefore their cost may be smaller than2 a k / k . However, since the analysis is asymptotic, this factoris not likely to affect the results. XPERIMENTAL R ESULTS

To compare the analytical results to the actual size of RNLUs,we applied the presented algorithm and algorithms [15], [16],to randomly generated binary sequences of length up to 10 bits.For all algorithms, circuits for the updating functions weresynthesized using the logic synthesis tool ABC [49]. Thegeneric library of gates mcnc.genlib was used for technologymapping.Figures 4a and 4b show the results for the degrees ofparallelization 1 and 100, respectively. 2-input AND is usedas a unit of gate size. We can see that RNLUs constructed bythe presented algorithm are considerably smaller that RNLUsconstructed by the algorithms [15] and [16]. The improvementis particularly striking for the degree of parallelization one.For example, for sequences of length 10 , RNLUs constructedby the algorithm [15] are 6.67 times larger than RNLUsconstructed by the presented algorithm. For the degree ofparallelization 100 and sequences of length 10 , RNLUs con-structed by the algorithm [16] are 65.1% larger than RNLUsconstructed by the presented algorithm. ONCLUSION

In this paper, we presented an algorithm for constructingRNLUs in which the support set of updating functions isreduced to the minimum. We proved that the expected sizeof the resulting RNLUs is asymptotically smaller than theexpected size of RNLUs constructed by previous approaches.The presented method might be useful for applicationswhich require efﬁcient generation of binary sequences, suchas testing, wireless communication, and cryptography. Sequence length, bits S i z eo f RN L U , ga t e s algorithm [15]presented (a) Degree of parallelization one. Sequence length, bits S i z e o f RN L U , ga t e s algorithm [16]presented (b) Degree of parallelization 100. Fig. 4: Comparison of RNLUs constructed by the presentedalgorithm to RNLUs constructed using the algorithms [15],[16]. Each dot is computed as an average for 100 randomlygenerated sequences of the same length. A CKNOWLEDGEMENT

This work was supported in part by the research grant No2011-03336 from Swedish Governmental Agency for Innova-tion Systems (VINNOVA) and in part by the research grantNo 621-2010-4388 from the Swedish Research Council. R EFERENCES [1] M. Robshaw, “Stream ciphers,” Tech. Rep. TR - 701, July 1994.[2] A. Juels, “RFID security and privacy: a research survey,”

IEEE Journalon Selected Areas in Communications , vol. 24, pp. 381–394, Feb. 2006.[3] T. Good and M. Benaissa, “ASIC hardware performance,”

New StreamCipher Designs: The eSTREAM Finalists, LNCS 4986 , pp. 267–293,2008.[4] B. G. Lee and B.-H. Kim,

Scrambling Techniques for CDMA Commu-nications . Berlin, Springer, 2001.[5] R. L. Pickholtz and et. al., “Theory of spread spectrum communications- a tutorial,”

IEEE Trans. on Communications , vol. 30, no. 5, pp. 855–883, 1982.[6] R. Gold, “Optimal binary sequences for spread spectrum multiplexing(corresp.),”

Information Theory, IEEE Transactions on , vol. 13, pp. 619–621, october 1967.[7] J. Davis and J. Jedwab, “Peak-to-mean power control in OFDM, Golaycomplementary sequences, and Reed-Muller codes,”

IEEE Trans. on Inf.Theory , vol. 45, no. 7, pp. 2397–2417, 1999.[8] B. Popovic, “Spreading sequences for multicarrier CDMA systems,”

IEEE Transactions on Communications , vol. 47, pp. 918–926, June1999. [9] E. McCluskey, “Built-in self-test techniques,”

IEEE Design and Test ofComputers , vol. 2, pp. 21–28, 1985.[10] E. B. Eichelberger and E. Lindbloom, “Random-pattern coverage en-hancement and diagnosis for LSSD logic self-test,”

IBM J. Res. Dev. ,vol. 27, pp. 265–272, May 1983.[11] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded determin-istic test,”

Computer-Aided Design of Integrated Circuits and Systems,IEEE Transactions on , vol. 23, pp. 776 – 792, may 2004.[12] C. Chin and E. J. McCluskey, “Weighted pattern generation for built-inself test,” Tech. Rep. TR - 84-7, Stanford Center for Reliable Computing,Aug. 1984.[13] J. Savir, G. S. Ditlow, and P. H. Bardell, “Random pattern testability,”

IEEE Transactions on Computers , vol. C-33, pp. 79 –90, Jan. 1984.[14] G. Hetherington and et. al., “Logic bist for large industrial designs:real issues and case studies,” in

Proc. of International Test Conference ,pp. 358 –367, 1999.[15] E. Dubrova, “Synthesis of binary machines,”

IEEE Transactions onInformation Theory , vol. 57, pp. 6890 – 6893, 2011.[16] E. Dubrova, “Synthesis of parallel binary machines,” in

Proc. of IC-CAD’2011 , (San Jose, CA, USA), Nov. 2011.[17] C. E. Shannon, “The synthesis of two-terminal switching circuits,”

BellSystem Technical Journal , vol. 28, no. 1, pp. 59–98, 1949.[18] I. Wegener,

The Complexity of Boolean Functions . John Wiley and SonsLtd, 1987.[19] S. Golomb,

Shift Register Sequences . Aegean Park Press, 1982.[20] R. Lidl and H. Niederreiter,

Introduction to Finite Fields and theirApplications . Cambridge Univ. Press, 1994.[21] D. E. Knuth,

The Art of Computer Programming Volume 2, Seminumer-ical Algorithms . Boston, MA, USA: Addison-Wesley Reading, 1969.[22] J. McCluskey, “High speed calculation of cyclic redundancy codes,” in

Proceedings of the 1999 ACM/SIGDA seventh international symposiumon Field programmable gate arrays , FPGA ’99, (New York, NY, USA),pp. 250–256, ACM, 1999.[23] G. Mrugalski, J. Rajski, and J. Tyszer, “Ring generators - New devicesfor embedded test applications,”

Transactions on Computer-Aided De-sign of Integrated Circuits and Systems , vol. 23, no. 9, pp. 1306–1320,2004.[24] R. David,

Random Testing of Digital Circuits . New York: MarcelDekker, 1998.[25] S. Mukhopadhyay and P. Sarkar, “Application of LFSRs for parallelsequence generation in cryptologic algorithms,” in

Computational Sci-ence and Its Applications - ICCSA 2006 , vol. 3982 of

Lecture Notes inComputer Science , pp. 436–445, Springer Berlin / Heidelberg, 2006.[26] E. R. Berlekamp, “Nonbinary BCH decoding,” in

International Sympo-sium on Information Theory , (San Remo, Italy), 1967.[27] J. Massey, “Shift-register synthesis and BCH decoding,”

IEEE Transac-tions on Information Theory , vol. 15, pp. 122–127, 1969.[28] D. Mandelbaum, “An approach to an arithmetic analog of Berlekamp’salgorithm,”

IEEE Transactions on Information Theory , vol. 30, no. 5,pp. 758–762, 1984.[29] K. Imamura and W. Yoshida, “A simple derivation of the Berlekamp-Massey algorithm and some applications,”

IEEE Transactions on Infor-mation Theory , vol. 33, no. 1, pp. 146–150, 1987.[30] P. Fitzpatrick, “New time domain errors and erasures decoding algorithmfor bch codes,”

Electronics Letters , vol. 32, no. 2, pp. 110–111, 1994. [31] M. Fleischmann, “Modiﬁed berlekamp-massey algorithm for two-sidedshift-register synthesis,”

Electronics Letters , vol. 31, no. 8, pp. 605–606,1995.[32] J. Dornstetter, “On the equivalence between Berlekamp’s and Euclid’salgorithms,”

IEEE Transactions on Information Theory , vol. 33, no. 3,pp. 428–431, 1987.[33] L. Welch and R. Sholtz, “Continued fractions and Berlekamp’s algo-rithm,”

IEEE Transactions on Information Theory , vol. 25, no. 1, pp. 19–27, 1979.[34] A. Lempel and W. L. Eastman, “High speed generation of maximallength sequences,”

IEEE Trans. Comput. , vol. 20, pp. 227–229, February1971.[35] H. Fredricksen, “A survey of full length nonlinear shift register cyclealgorithms,”

SIAM Review , vol. 24, no. 2, pp. 195–221, 1982.[36] C. J. Jansen,

Investigations On Nonlinear Streamcipher Systems: Con-struction and Evaluation Methods . Ph.D. Thesis, Technical Universityof Delft, 1989.[37] C. J. A. Jansen, “The maximum order complexity of sequence ensem-bles,”

Lecture Notes in Computer Science , vol. 547, pp. 153–159, 1991.Adv. Cryptology-Eupocrypt’1991, Berlin, Germany.[38] D. Linardatos and N. Kalouptsidis, “Synthesis of minimal cost nonlinearfeedback shift registers,”

Signal Process. , vol. 82, no. 2, pp. 157–176,2002.[39] P. Rizomiliotis and N. Kalouptsidis, “Results on the nonlinear span ofbinary sequences,”

IEEE Transactions on Information Theory , vol. 51,no. 4, pp. 1555–5634, 2005.[40] K. Limniotis, N. Kolokotronis, and N. Kalouptsidis, “On the nonlinearcomplexity and Lempel-Ziv complexity of ﬁnite length sequences,”

IEEE Transactions on Information Theory , vol. 53, no. 11, pp. 4293–4302, 2007.[41] C. Canni`ere and B. Preneel, “Trivium,”

New Stream Cipher Designs:The eSTREAM Finalists, LNCS 4986 , pp. 244–266, 2008.[42] M. Hell, T. Johansson, A. Maximov, and W. Meier, “The Grain family ofstream ciphers,”

New Stream Cipher Designs: The eSTREAM Finalists,LNCS 4986 , pp. 179–190, 2008.[43] B. Gittins, H. A. Landman, S. O’Neil, and R. Kelson, “A presentationon VEST hardware performance, chip area measurements, power con-sumption estimates and benchmarking in relation to the AES, SHA-256and SHA-512.” Cryptology ePrint Archive, Report 415, 2005.[44] E. Dubrova and S. Mansouri, “A BDD-based approach to constructingLFSRs for parallel CRC encoding,” in

Proc. of International Symposiumon Multiple-Valued Logic , pp. 128–133, 2012.[45] R. Rueppel, “Linear complexity and random sequences,” in

Advancesin Cryptology – EUROCRYPT’85 (F. Pichler, ed.), vol. 219 of

LectureNotes in Computer Science , pp. 167–188, Springer Berlin Heidelberg,1986.[46] N. Li, S. S. Mansouri, and E. Dubrova, “Secure key storage usingstate machines,” in