[PDF] Constrained Secrecy Capacity of Finite-Input Intersymbol Interference Wiretap Channels

Abstract

We consider reliable and secure communication over intersymbol interference wiretap channels (ISI-WTCs). In particular, we first examine the setup where the source at the input of an ISI-WTC is unconstrained and then, based on a general achievability result for arbitrary wiretap channels, we derive an achievable secure information rate for this ISI-WTC. Afterwards, we examine the setup where the source at the input of an ISI-WTC is constrained to be a finite-state machine source (FSMS) of a certain order and structure. Optimizing the parameters of this FSMS toward maximizing the secure information rate is a computationally intractable problem in general, and so, toward finding a local maximum, we propose an iterative algorithm that at every iteration replaces the secure information rate function by a suitable surrogate function whose maximum can be found efficiently. Although the secure information rates achieved in the unconstrained setup are expected to be larger than the secure information rates achieved in the constrained setup, the latter setup has the advantage of leading to efficient algorithms for estimating achievable secure rates and also has the benefit of being the basis of efficient encoding and decoding schemes.

Full PDF

11 Constrained Secrecy Capacity ofPartial-Response Wiretap Channels

Aria Nouri,

Student Member, IEEE,

Reza Asvadi,

Senior Member, IEEE,

Jun Chen,

Senior Member, IEEE, and Pascal O. Vontobel,

Fel low, IEEE

Abstract

We consider reliable and secure communication over partial-response wiretap channels (PR-WTCs). In particular, we ﬁrst examine the setup where the source at the input of a PR-WTCis unconstrained and then, based on a general achievability result for arbitrary wiretap channels,we derive an achievable secure information rate for this PR-WTC. Afterwards, we examine thesetup where the source at the input of a PR-WTC is constrained to be a ﬁnite-state machinesource (FSMS) of a certain order and structure. Optimizing the parameters of this FSMS towardmaximizing the secure information rate is a computationally intractable problem in general, andso, toward ﬁnding a local maximum, we propose an iterative algorithm that at every iterationreplaces the secure information rate function by a suitable surrogate function whose maximumcan be found eﬃciently. Although we expect the secure information rates achieved in the uncon-strained setup to be larger than the secure information rates achieved in the constrained setup, thelatter setup has the advantage of leading to eﬃcient algorithms for estimating achievable securerates and also has the beneﬁt of being the basis of eﬃcient encoding and decoding schemes.

Index Terms

Partial-response wiretap channel (PR-WTC), ﬁnite-state machine channel, ﬁnite-state ma-chine source, wiretap channel, secure rate, rate optimization.

A. Nouri and R. Asvadi are with the Cognitive Telecommunication Research Group, Department of Telecommuni-cations, Faculty of Electrical Engineering, Shahid Beheshti University, Tehran, Iran. (e-mails: [email protected];[email protected]) R. Asvadi is the corresponding author.J. Chen is with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON,Canada (e-mail: [email protected]).P.O. Vontobel is with the Department of Information Engineering and the Institute of Theoretical Computer Scienceand Communications, The Chinese University of Hong Kong, Hong Kong SAR (e-mail: [email protected]). a r X i v : . [ c s . I T ] F e b I. Introduction

A. Background

Partial-response (PR) channels are a class of channels with memory that are used as amodel for transmission over bandwidth-limited channels. These channels are useful modelsin many applications including data storage and magnetic recording [1]–[3], wireless com-munication over time-varying multipath channels [4], optical communications and digitalsubscriber lines [5]. PR channels are a special case of ﬁnite-state machine channels (FSMCs),sometimes also called ﬁnite-state channels (FSCs) [6]. Deriving upper and lower bounds onthe capacity of FSMCs has received signiﬁcant attention in order to design and evaluatecodes for such channels [7].The classical Blahut-Arimoto algorithm (BAA) [8], [9] was generalized in [10] to optimizeﬁnite-state machine sources (FSMSs) at the input of FSMCs in order to maximize the mutualinformation rate between channel input and output. A comparison of achievable informationrates [10], [11] with upper bounds on the (unconstrained) capacity of FSMCs [12], [13] showsthat typically there exists only a small gap between them, a gap that can be further narrowedby increasing the memory length of the FSMSs [14]. Hence, the information rates achievablewith FSMSs at the input of FSMCs are close to optimal.Inherent non-ideal properties of communication channels, such as noise and interference,can be exploited for achieving security at the physical layer. Information-theoretic limitsof secure communications without a secret key agreement between a transmitter and alegitimate receiver was ﬁrst considered in [15], [16]. The ubiquity of PR channels in com-munications and a growing demand for physical layer secrecy has led to increased attentionto the analysis of the secrecy capacity of PR-wiretap channels (PR-WTCs). A PR-WTCconsists of a primary channel and a secondary channel, where the primary channel is a PRchannel that connects a transmitter (Alice) to a legitimate receiver (Bob), and where thesecondary channel is a PR channel that connects the transmitter (Alice) to an eavesdropper(Eve). In the following, the primary channel will be called “Bob’s channel” and labeled “B”,whereas the secondary channel will be called “Eve’s channel” and labeled “E”.In [17], useful secrecy metrics have been introduced based on distances between tworandom variables, corresponding to a transmitted message and Eve’s observation of it, respectively. The distances are ordered with respect to (w.r.t.) their strength in guaranteeingsecrecy over a wiretap channel. By using these metrics and channel resolvability techniques,the results of secrecy capacity for discrete memoryless wiretap channels (DM-WTCs) havebeen generalized to arbitrary wiretap channels [18].

B. Contribution

In this paper, we consider a PR-WTC without feedback, where the input symbols arelimited to some ﬁnite alphabet. In a ﬁrst step, we study the unconstrained setup, i.e., thesetup where no constraints are placed on Alice’s source, and derive an achievable secureinformation rate based on the general achievability result for arbitrary wiretap channels es-tablished using information-spectrum methods [19]. Afterwards, we consider the constrained setup, i.e., the setup where Alice’s source is an FSMS of a given order and structure, andpropose an eﬃcient iterative algorithm for optimizing the parameters of this FSMS toward(locally) maximizing the above-mentioned achievable secure information rate. The key ideabehind this algorithm is to approximate the secure information rate by suitable surrogatefunctions that can relatively easily be maximized. The proposed algorithm resembles thewell-known expectation-maximization (EM) algorithm and has similar convergence behavior.Moreover, its searching step size can be controlled via two adjustable parameters.Roughly speaking, the FSMS that is found by this algorithm is the FSMS of a givenorder and structure that best exploits the discrepancies between the frequency responses ofBob’s channel and Eve’s channel. Numerical simulations indicate that the obtained secrecycapacities can be positive even for scenarios where the capacity of Eve’s channel is higherthan the capacity of Bob’s channel.

C. Related Work

In terms of the main focus of this paper, i.e., the secrecy capacity of PR-WTCs without feedback, to the best of our knowledge, the only prior work can be found in [20]. Namely, inthat paper the authors consider the problem of evaluating the secrecy capacity of a ﬁnite-state wiretap channel (FS-WTC). The FS-WTC is deﬁned as a wiretap channel where Boband Eve observe the input source through two distinct FSMCs. Based on general results of the DM-WTCs and imposing the less noisier condition on Bob’s channel (compared withEve’s channel), the authors in [20] generalized the expression of the secrecy capacity of DM-WTCs to the case of FS-WTCs. Then, they apply a stochastic algorithm to approximatethis quantity. This approach has the following issues: • They derived the secrecy capacity of FS-WTCs based on the general results of DM-WTCs. In DM-WTCs, it is necessary to impose a so-called less-noisier constraint onBob’s channel compared with Eve’s channel in order to relate the secure information rateof the wiretap channel to the information rates of Bob’s and Eve’s channels. However,channels with diﬀerent non-ﬂat frequency responses cannot be ordered by their noisepower. Even if we could order the channels based on their unconstrained capacities,we demonstrate in our simulations that positive secure rates are achievable for thecases that Bob’s channel has lower unconstrained channel capacity compared with Eve’schannel. • Another issue in their proposed algorithm is their choice of function approximatingthe secure information rate function. The gradient of this approximating function atan operating point is not the same as the gradient of the secure rate function at thatpoint. This issue leads to an inaccurate search direction, which eventually makes thealgorithm unstable. • They only discuss PR channels as an example of an FSMC. In fact, they do not showsimulation results corresponding to PR channels or other FSMCs. (The simulationresults that are shown in [20] correspond to the maximized secure information rateof the wiretap channel comprised of a noiseless channel to Bob and a binary symmetricchannel to Eve (see [20, Fig. 2]). The achievable secure rate of this wiretap channelis maximized by optimizing the parameters of a run-length constrained Markov source[10,

Example

17] as an input source.)The secrecy capacity of ﬁnite-state Markov wiretap channels with delayed noiseless feed-back from the legitimate receiver to the transmitter has recently been studied in [21], [22].The feedback information contains the received output and the state of Bob’s channel. It In the terminology of the present paper, a Markov wiretap channel is a wiretap channel where Bob’s and Eve’schannels are FSMCs with a state process that evolves independently of the input process. is shown in [21] that such feedback can enlarge the rate-equivocation region for ﬁnite-stateMarkov wiretap channels compared with the case without feedback. It is also known thathigher secure rates can be achieved by introducing artiﬁcial noise matched to the spectrum ofBob’s channel [23]. Although these enhancements are certainly interesting, in this paper wefocus on the standard version of the PR-WTC model (with neither feedback nor additionalartiﬁcial noise) as it requires few assumptions and consequently is more practically relevant.

D. Paper Organization

The remainder of this paper is organized as follows. Sections II-A and II-B introduce thesystem model and some preliminary concepts related to PR-WTCs, FSMSs, and FSMCs.Section II-C presents achievability results on the (unconstrained and constrained) secureinformation rate of PR-WTCs. Section III discusses an eﬃcient algorithm for estimatingthe secure rate for a given FSMS at the input of a PR-WTC. Section IV describes thealgorithm mentioned in Section I-B for optimizing the FSMS at the input of a PR-WTCand analyzes it in detail. Section V contains some numerical results and discussions. Finally,Section VI draws the conclusion.

E. Notation

The sets of integers and real numbers are denoted by Z and R , respectively. Other thanthat, sets are denoted by calligraphic letters, e.g., S . The Cartesian product of two sets X and Y is written as X × Y , and the n -fold Cartesian product of X with itself is written as X n . If X is a ﬁnite set, then its cardinality is denoted by |X | .Random variables are denoted by upper-case italic letters, e.g., X , their realizations by thecorresponding lower-case letters, e.g., x , and the set of possible values by the correspondingcalligraphic letter, e.g., X . Random vectors are denoted by upper-case boldface letters, e.g., X , and their realizations by the corresponding lower-case letters, e.g., x . For integers n and n satisfying n ≤ n , the notation X n n (cid:44) ( X n , X n +1 , . . . , X n ) is used for a time-indexedvector of random variables and x n n (cid:44) ( x n , x n +1 , . . . , x n ) for its realization.The probability of an event ξ is denoted by Pr( ξ ). Furthermore, p X ( · ) denotes theprobability mass function (PMF) of X if X is a discrete random variable and the probability density function (PDF) of X if X is a continuous random variable. Similarly, p Y | X ( · | x )denotes the conditional PMF of Y given X = x if Y is a discrete random variable and theconditional PDF of Y given X = x if Y is a continuous random variable.Note that boldface letters are also used for (deterministic) matrices, e.g., A , with the( i, j )-entry of A being called A ij .The function log( · ) denotes the natural logarithm. The entropy of a random variable X ,the mutual information between two random variables X and Y , and the mutual informationbetween two random variables X and Y conditioned on the random variable Z are denoted by H ( X ), I ( X ; Y ), and I ( X ; Y | Z ), respectively The information density between the respectiverealizations of random variables X and Y is deﬁned to be i ( x ; y ) (cid:44) log p X,Y ( x, y ) p X ( x ) · p Y ( y ) ! . Moreover, the conditional information density between the respective realizations of randomvariables X and Y given Z = z is i ( x ; y | z ) (cid:44) log p X,Y | Z ( x, y | z ) p X | Z ( x | z ) · p Y | Z ( y | z ) ! . Note that I ( X ; Y ) = X x,y p X,Y ( x, y ) · i ( x ; y ) ,I ( X ; Y | Z ) = X x,y,z p X,Y,Z ( x, y, z ) · i ( x ; y | z ) . Finally, the variational distance between the PMFs of two random variables X and Y over the same ﬁnite alphabet X is deﬁned as d X ( p X , p Y ) (cid:44) P x ∈X | p X ( x ) − p Y ( x ) | . II. System Model and Information Rates

Section II-A gives the deﬁnitions of ﬁnite-state machine sources (FSMSs) and ﬁnite-statemachine channels (FSMCs), based on which Section II-B deﬁnes ﬁnite-state joint source-wiretap channels (FS-JWCs). Various information rates relevant for FS-JWCs are thenintroduced and characterized in Section II-C.

A. Finite-State Machine Sources and Channels

In this section, we deﬁne ﬁnite-state machine sources and ﬁnite-state machine channels,along with special cases of such sources and channels as far as relevant for this paper. Formore background and more examples we refer the interested reader to [6], [10].

Deﬁnition

Finite-state machine source (FSMS) ) . A time-invariant (discrete-time) FSMShas a state process { ¯ S t } t ∈ Z and an output process { X t } t ∈ Z , where ¯ S t ∈ ¯ S and X t ∈ X for all t ∈ Z . We assume that the alphabets ¯ S and X are ﬁnite and that for any positive integer n the joint PMF of ¯ S n and X n conditioned on ¯ S = ¯ s decomposes as p X n , ¯ S n | S ( x n , ¯ s n | s ) = n Y t =1 p X t , ¯ S t | ¯ S t − ( x t , ¯ s t | ¯ s t − ) , where p X t , ¯ S t | ¯ S t − ( x t , ¯ s t | ¯ s t − ) is independent of t . (cid:3) Remark . In the following, we will mostly consider FSMSs where ¯ S (cid:44) X ¯ ν for some positiveinteger ¯ ν and ¯ s t (cid:44) x tt − ¯ ν +1 for all t ∈ Z . Note: • The integer ¯ ν will be called the memory order of such an FSMS. • It holds that p X t , ¯ S t | ¯ S t − ( x t , ¯ s t | ¯ s t − ) = p X t | ¯ S t − ( x t | ¯ s t − ) = p X t | X t − t − ¯ ν ( x t | x t − t − ¯ ν ) , for ¯ s t = x tt − ¯ ν +1 and ¯ s t − = x t − t − ¯ ν . • There is a bijection between state sequences and output sequences, i.e., one sequencedetermines the other sequence.From the above comments it follows that such an FSMS is characterized by the triple (cid:16) X , ¯ ν, p X t | X t − t − ¯ ν ( x t | x t − t − ¯ ν ) (cid:17) . (cid:3) Note that all possible state sequences of an FSMS can be represented by a trellis diagram.Because of the assumed time-invariance, it is suﬃcient to show a single trellis section. Forexample, Fig. 1( a ) shows a trellis section for an FSMS characterized by the triple (cid:16) X (cid:44) { +1 , − } , ¯ ν (cid:44) , p X t | X t − t − ¯ ν ( x t | x t − t − ¯ ν ) (cid:17) .Before giving the deﬁnition of a partial-response (PR) channel, which is the type of channelof main interest in this paper, we introduce the more general class of ﬁnite-state machinechannels (which were called ﬁnite-state channels in [6]). Deﬁnition

Finite-state machine channel (FSMC) ) . A time-invariant FSMC has an inputprocess { X t } t ∈ Z , an output process { Y t } t ∈ Z , and a state process { S t } t ∈ Z , where X t ∈ X , Y t ∈ Y , and S t ∈ S for all t ∈ Z . We assume that the alphabets X and S are ﬁnite andthat for any positive integer n the joint PMF/PDF of S n and Y n conditioned on S = s and X n = x n is p S n , Y n | S , X n ( s n , y n | s , x n ) = n Y t =1 p S t ,Y t | S t − ,X t ( s t , y t | s t − , x t ) , where p S t ,Y t | S t − ,X t ( s t , y t | s t − , x t ) is independent of t . (cid:3) An important special case of an FSMC is a partial-response (PR) channel.

Deﬁnition

Partial-response (PR) channel ) . A PR channel with transfer polynomial g ( D ) (cid:44) P mt =0 g t D t ∈ R [ D ], where m is called the memory length, has an input process { X t } t ∈ Z , anoiseless output process { U t } t ∈ Z and a noisy output process { Y t } t ∈ Z , U t (cid:44) m X ‘ =0 g ‘ X t − ‘ , t ∈ Z ,Y t (cid:44) U t + N t , t ∈ Z , where X t , U t , Y t ∈ R for all t ∈ Z . In the following, we will assume that the noise process iswhite Gaussian noise, i.e., { N t } t ∈ Z are i.i.d. Gaussian random variables with mean zero andvariance σ . Clearly, a PR channel is parameterized by the couple (cid:16) g ( D ) , σ (cid:17) . (cid:3) A PR channel described by the couple (cid:16) g ( D ) (cid:44) P mt =0 g t D t , σ (cid:17) and having an inputprocess { X t } t ∈ Z taking values in a ﬁnite set X (cid:40) R , is a special case of an FSMC. Indeed,let S (cid:44) X m . Then p S t ,Y t | S t − ,X t ( s t , y t | s t − , x t ) = p S t | S t − ,X t ( s t | s t − , x t ) · p Y t | S t − ,X t ( y t | s t − , x t ) , where p S t | S t − ,X t ( s t | s t − , x t ) (cid:44)  s t = x tt − m +1 and s t − = x t − t − m )0 (otherwise) ,p Y t | S t − ,X t ( y t | s t − , x t ) (cid:44) √ πσ · exp − ( y t − u t ) σ ! , and where u t (cid:44) P m‘ =0 g ‘ x t − ‘ with x t − t − m = s t − .All possible state sequences of a PR channel (and, more generally, of an FSMC) can berepresented by a trellis diagram. Because of the assumed time-invariance, it is suﬃcient toshow a single trellis section. For example, Fig. 1( b ) shows a trellis section for a PR channelcharacterized by the couple (cid:16) g ( D ) (cid:44) − D, σ (cid:17) (known as a dicode channel) and withinput alphabet X (cid:44) { +1 , − } . In this diagram, branches start at state s t − , end at state s t ,and have noiseless channel output symbol u t shown next to them.We are now ready to deﬁne the type of wiretap channel of interest in this paper. Deﬁnition

Partial-response wiretap channel (PR-WTC) ) . In a PR-WTC, Alice transmitsdata symbols over Bob’s channel and over Eve’s channel, which are both assumed to bePR channels with ﬁnite input alphabet X (cid:40) R . Speciﬁcally, Bob’s channel is a PR channeldescribed by the couple (cid:16) g B ( D ) , σ (cid:17) , with transfer polynomial g B ( D ) = P m B t =0 g B t D t , noiselessoutput process { U t } t ∈ Z , noise process { N B t } t ∈ Z , and noisy output process { Y t } t ∈ Z . Similarly,Eve’s channel is a PR channel described by the couple (cid:16) g E ( D ) , σ (cid:17) , with transfer polynomial g E ( D ) = P m E t =0 g E t D t , noiseless output process { V t } t ∈ Z , noise process { N E t } t ∈ Z , and noisyoutput process { Z t } t ∈ Z . We assume that the noise process of Bob’s channel and the noiseprocess of Eve’s channel are independent. Clearly, the PR-WTC is parameterized by thequadruple (cid:16) g B ( D ) , g E ( D ) , σ , σ (cid:17) . (cid:3) B. Finite-State Joint Source-Wiretap ChannelsDeﬁnition . We deﬁne a ﬁnite-state joint source wiretap channel (FS-JSWTC) model basedon the concatenation of the following components: • an FSMS as in Remark 2 described by the triple (cid:16) X , ¯ ν, p X t | X t − t − ¯ ν ( x t | x t − t − ¯ ν ) (cid:17) , where X isa ﬁnite subset of R ; • a PR-WTC as in Deﬁnition 5 described by the quadruple (cid:16) g B ( D ) , g E ( D ) , σ , σ (cid:17) . (cid:3) Note that an FS-JSWTC can be modeled by a single (time-invariant) ﬁnite-state machine.Namely, letting ν (cid:44) max(¯ ν, m B , m E ) , (1) Fig. 1: ( a ) Trellis section of an FSMS with X = { +1 , − } and memory order ¯ ν = 3. State transitionprobabilities are shown next to branches. ( b ) Trellis section of a dicode channel (i.e., a PR channel with g ( D ) = 1 − D ) when used with input alphabet X = { +1 , − } . The noiseless channel output symbol is shownnext to branches. ( c ) Trellis section of an EPR4 channel (i.e., a PR channel with g ( D ) = 1+ D − D − D ) whenused with input alphabet X = { +1 , − } . The noiseless channel output symbol is shown next to branches. ( d )Trellis section of an FS-JSWTC comprised of a third-order FSMS, a dicode channel to Bob, and an EPR4channel to Eve. State transition probabilities and noiseless channel output symbols (one noiseless channeloutput symbol for Bob’s channel, one noiseless channel output symbol for Eve’s channel) are shown next tobranches. where m B and m E are the degrees of g B ( D ) and g E ( D ), respectively, the state space is givenby S (cid:44) X ν and the state at time t ∈ Z is given by S t (cid:44) X tt − ν +1 ∈ S . Assumption . In the following, we will focus on the case where ¯ ν ≥ m B and ¯ ν ≥ m E , whichimplies that ν = max(¯ ν, m B , m E ) = ¯ ν. With suitable notation, more general cases can behandled. (cid:3)

Thanks to Assumption 7, the state transition probabilities of the ﬁnite-state machinemodeling the FS-JSWTC will be the same as the state transition probabilities of the FSMS.

Deﬁnition . Let B denote the set of all valid consecutive state pairs ( s t − , s t ) ∈ S × S forany t ∈ Z . Moreover, let −→S i (cid:44) n j (cid:12)(cid:12)(cid:12) ( i, j ) ∈ B o , ←−S j (cid:44) n i (cid:12)(cid:12)(cid:12) ( i, j ) ∈ B o , be the set of states S t reachable from state S t − = i and the set of states S t − that canreach S t = j , respectively. (cid:3) Deﬁnition . For ( i, j ) ∈ B , let p ij be the time-invariant probability of going from state S t − = i to state S t = j for any t ∈ Z . We assume that { p ij } ( i,j ) ∈B is such that the FSMSis ergodic, and so there is a unique stationary state probability distribution { µ i } i ∈S , i.e., p S t ( i ) = µ i for all t ∈ Z . Finally, let { Q ij } ( i,j ) ∈B be deﬁned by Q ij (cid:44) µ i · p ij , ( i, j ) ∈ B .In the above deﬁnition, we started with { p ij } ( i,j ) ∈B and derived { µ i } i ∈S and { Q ij } ( i,j ) ∈B from it. However, for analytical purposes, it turns out to be beneﬁcial to start with { Q ij } ( i,j ) ∈B and derive { p ij } ( i,j ) ∈B and { µ i } i ∈S from { Q ij } ( i,j ) ∈B . Note that the set of all { Q ij } ( i,j ) ∈B isgiven by the polytope Q ( B ), where Q ( B ) (cid:44)  { Q ij } ( i,j ) ∈B (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q ij ≥ , ∀ ( i, j ) ∈ B ; X ( i,j ) ∈B Q ij = 1; X j ∈−→S i Q ij = X k ∈←−S i Q ki , ∀ i ∈ S  . (See [10] for similar observations.) In the following, we will use the short-hand notation Q for { Q ij } ( i,j ) ∈B . Example . Consider an FS-JSWTC • where the FSMS is as in Remark 2 described by the triple (cid:16) X , ¯ ν, p X t | X t − t − ¯ ν ( x t | x t − t − ¯ ν ) (cid:17) with X = { +1 , − } and ¯ ν = 3 (see Fig. 1( a )), • where Bob’s channel is a dicode channel, i.e., g B ( D ) = 1 − D (see Fig. 1( b )), and • where Eve’s channel is an EPR4 channel, i.e., g E ( D ) = 1 + D − D − D (see Fig. 1( c )).This setup satisﬁes Assumption 7 and so ν = ¯ ν = 3. All possible state sequences of an FS-JSWTC can be represented by a trellis diagram. Because of the assumed time-invariance, itis suﬃcient to show a single trellis section, as is done in Fig. 1( d ) for the present example. (cid:3) Remark

11 (

Parameterized family of Q ) . Frequently, we will consider the setup where Q ( θ ) isa function of some parameter θ . More precisely, for every ( i, j ) ∈ B , we let Q ij ( θ ) be a smoothfunction of some parameter θ , where θ varies over a suitable range. We require that for every θ it holds that Q ( θ ) = n Q ij ( θ ) o ( i,j ) ∈B ∈ Q ( B ). For every, ( i, j ) ∈ B , we denote the derivative of Q ij ( θ ) w.r.t. θ and evaluated at ˜ θ by Q θij (˜ θ ). We denote the corresponding steady-state andthe transition probabilities parameterized by θ by µ i ( θ ) and p ij ( θ ), respectively. Similarly,we denote their derivatives w.r.t. θ and evaluated at ˜ θ by µ θi (˜ θ ) and p θij (˜ θ ), respectively. Obviously, we have X ( i,j ) ∈B Q θij (˜ θ ) = 0 , X i ∈S µ θi (˜ θ ) = 0 . (2) (cid:3) Remark . Some technical remarks concerning the considered setup: • It is well known that PR channels are indecomposable FSMCs [6], which means thatthe inﬂuence of the initial state vanishes over time, which implies that information ratesare well deﬁned even if the initial state is not known. • Algorithm 1 in Section IV will make use of Perron–Frobenius theory for irreduciblenon-negative matrices. One can verify that the relevant matrix is indeed irreducible,except for uninteresting boundary cases. (cid:3)

C. Achievable Secure Rates

In this section we summarize some known results for wiretap channels, along with estab-lishing some achievable secure rates for PR-WTCs.Information-spectrum methods have been used to analyze the fundamental limits of securecommunication over arbitrary wiretap channels [18]. By adopting the notations in [18], the spectral inf/sup-mutual information rates are deﬁned to bep-lim sup n →∞ n i ( X n ; Y n ) (cid:44) inf (cid:26) α : lim n →∞ Pr (cid:18) n i ( X n ; Y n ) > α (cid:19) = 0 (cid:27) , p-lim inf n →∞ n i ( X n ; Y n ) (cid:44) sup (cid:26) β : lim n →∞ Pr (cid:18) n i ( X n ; Y n ) < β (cid:19) = 0 (cid:27) , where p-lim is the probability limit operator. Lemma . [17, Lemma 2] For an arbitrary wiretap channel (cid:16) X , { p Y n , Z n | X n ( y n , z n | x n ) } ∞ n =1 , Y , Z ) consisting of an arbitrary input alphabet X , two arbitrary output alphabets Y and Z corresponding to Bob’s and Eve’s observations, respectively, and a sequence of transitionprobabilities { p Y n , Z n | X n ( y n , z n | x n ) } ∞ n =1 , all secure rates R s satisfying R s < max { X n } ∞ n =1 (cid:18) p-lim inf n →∞ n i ( X n ; Y n ) − p-lim sup n →∞ n i ( X n ; Z n ) (cid:19) are achievable under the reliability criterionlim sup n →∞ (cid:15) n = 0 , (3)and the secrecy criterion p-lim sup n →∞ d M n ×Z n ( p M n , Z n , p M n p Z n ) = 0 , (4)where (cid:15) n is the probability of error of Bob’s decoder for a block code of length n and where M n is the transmitted message uniformly chosen from an alphabet M n .Note that the secrecy criterion (4) is stronger than the so-called weak secrecy criterion ,which is deﬁned as p-lim sup n →∞ n I ( M n ; X n ) = 0 , and weaker than the so-called strong secrecy criterion , which is deﬁned asp-lim sup n →∞ I ( M n ; X n ) = 0 , see [18, Lemma 1].Lemma 13 can be leveraged to deduce the following achievability result for PR-WTCs. Proposition . Consider some PR-WTC described by the quadruple (cid:16) g B ( D ) , g E ( D ) , σ , σ (cid:17) and with input alphabet X . For any integer ν ≥ max( m B , m E ), any positive integer ‘ , andany input distribution p X ‘ − ν +1 , all secure rates R s satisfying R s < ‘ + 2 ν (cid:16) I ( X ‘ ; Y ‘ | X − ν +1 ) − I ( X ‘ ; Z ‘ | X − ν +1 ) − ν · log |X | (cid:17) are achievable on this PR-WTC under the reliability criterion (3) and the secrecy crite-rion (4). Proof.

See Appendix A.

Deﬁnition . Consider an FS-JSWTC as in Deﬁnition 6 with an FSMS described by Q . We deﬁne R s ( Q ) (cid:44) lim n →∞ n (cid:18) I ( S n ; Y n | S ) − I ( S n ; Z n | S ) (cid:19) . (5) (cid:3) Corollary . Consider an FS-JSWTC as in Deﬁnition 6 with an FSMS described by Q . Allsecure rates R s satisfying R s < R s ( Q )are achievable under the reliability criterion (3) and the secrecy criterion (4). Proof.

Let ν be the memory of the associated FS-JSWTC (see (1)). It is clear that I ( X n ; Y n | X − ν +1 ) = I ( S n ; Y n | S ) and I ( X n ; Z n | X − ν +1 ) = I ( S n ; Z n | S ) . Invoking Proposition 14 and letting n → ∞ proves the promised result.We are now in a position to introduce the notion of constrained secrecy capacity, whichis a key quantity to be studied in the subsequent parts of this paper. Deﬁnition . Consider an FS-JSWTC as in Deﬁnition 6, where the FSMS described by Q can vary in Q ( B ). The constrained secrecy capacity (or, more precisely, the Q ( B )-constrainedsecrecy capacity) is deﬁned as C Q ( B ) (cid:44) max Q ∈Q ( B ) R s ( Q ) . III. Secure rate: Estimation

Throughout this section, we consider an FS-JSWTC as in Deﬁnition 6, where the FSMSis described by Q ∈ Q ( B ). The secure rate in R s ( Q ) can be eﬃciently estimated usingvariants of the algorithms in [24]. (We omit the details.) The main purpose of this section isto present an alternative approach for estimating R s ( Q ). Although the resulting algorithmsby themselves are slightly less eﬃcient than the estimation algorithms based on [24], theyare based on quantities that need to be calculated as part of the optimization algorithmpresented in the next section. Therefore, when running these optimization algorithms, thesequantities are readily available and can be used to estimate R s ( Q ). T B ij ( Q ) (cid:44) lim n →∞ n n X t =1 X y n ∈Y n p Y n ( y n ) · log  p S t − ,S t | Y n ( i, j | y n ) p St − ,St | Y n ( i,j | y n ) /µ i p ij p S t − | Y n ( i | y n ) p St − | Y n ( i | y n ) /µ i  (4) T E ij ( Q ) (cid:44) lim n →∞ n n X t =1 X z n ∈Z n p Z n ( z n ) · log  p S t − ,S t | Z n ( i, j | z n ) p St − ,St | Z n ( i,j | z n ) /µ i p ij p S t − | Z n ( i | z n ) p St − | Z n ( i | z n ) /µ i  , (5)ˇ T B ij ( Q ) (cid:44) n n X t =1 log  p S t − ,S t | Y n ( i, j | ˇy n ) p St − ,St | Y n ( i,j | ˇy n ) /µ i p ij p S t − | Y n ( i | ˇy n ) p St − | Y n ( i | ˇy n ) /µ i  , (6)ˇ T E ij ( Q ) (cid:44) n n X t =1 log  p S t − ,S t | Z n ( i, j | ˇz n ) p St − ,St | Z n ( i,j | ˇz n ) /µ i p ij p S t − | Z n ( i | ˇz n ) p St − | Z n ( i | ˇz n ) /µ i  . (7) Deﬁnition

18 ( T B ij and T E ij values ) . For every ( i, j ) ∈ B , we deﬁne T B ij ( Q ) and T E ij ( Q ) to be,respectively, as shown in (4) and (5) at the top of the next page. (cid:3) The expressions in Deﬁnition 18 are similar to the expression for ˇ T ( N ) ij in [10, Lemma 70],part “second possibility.” Proposition

19 (

Secure information rate ) . The secure rate of the FS-JSWTC under consid-eration can be expressed as follows in terms of the T B ij and T E ij values: R s ( Q ) = X ( i,j ) ∈B Q ij · (cid:16) T B ij ( Q ) − T E ij ( Q ) (cid:17) . Proof.

See Appendix B.

Remark . The reformulation of R s ( Q ) in Proposition 19 can be used to eﬃciently estimate R s ( Q ) as follows:

1) Generate a sequence ˇx n based on the FSMS Q .2) Simulate Bob’s channel with ˇx n at the input to obtain ˇy n at the output.3) Simulate Eve’s channel with ˇx n at the input to obtain ˇz n at the output.4) For every ( i, j ) ∈ B , compute (6) and (7) as shown at the top of the page. Thesequantities can be eﬃciently computed with the help of variants of the sum-product /BCJR algorithm. (See [10] for similar observations.) The accuracy of the approximation can be controlled by choosing n suitably large.

5) Estimate R s ( Q ) by the quantityˇ R s ( Q ) = X ( i,j ) ∈B Q ij · (cid:16) ˇ T B ij ( Q ) − ˇ T E ij ( Q ) (cid:17) . (cid:3) IV. Secure rate: Optimization

Throughout this section, we consider an FS-JSWTC as in Deﬁnition 6, where the FSMSdescribed by Q varies in Q ( B ). The optimization problem appearing in the speciﬁcation ofthe constrained capacity C Q ( B ) in Deﬁnition 17 turns out to be diﬃcult to solve because thefunction R s ( Q ) is non-concave in general. Given this, we focus in this section on eﬃcientalgorithms for ﬁnding a local maximum of R s ( Q ). We do this by formulating an iterativealgorithm inspired by the expectation-maximization (EM) algorithm. Namely, the presentedalgorithm is an algorithm that at every step approximates the function R s ( Q ) by a suitablesurrogate function that can be eﬃciently maximized. Related techniques were also usedin [10], [25]. A. Outline of the Optimization Algorithm

The proposed algorithm is an iterative algorithm that works as follows: • Assume that at the current iteration the algorithm has found the FSMS described by˜ Q (cid:44) n ˜ Q ij o ( i,j ) ∈B . • Around Q = ˜ Q , we approximate the function R s ( Q ) over Q ( B ) by the surrogate function ψ ˜ Q ( Q ) over Q ( B ) satisfying the following properties: – The value of ψ ˜ Q ( Q ) matches the value of R s ( Q ) at Q = ˜ Q . – The gradient of ψ ˜ Q ( Q ) w.r.t. Q matches the gradient of R s ( Q ) w.r.t. Q at Q = ˜ Q . – The function ψ ˜ Q ( Q ) is concave in Q and can be eﬃciently maximized. • Replace ˜ Q by the Q maximizing ψ ˜ Q ( Q ).A sketch of these functions is shown in Fig. 2.In the following, in the same way that we derived { p ij } ( i,j ) ∈B and { µ i } i ∈S from Q = { Q ij } ( i,j ) ∈B , we will derive { ˜ p ij } ( i,j ) ∈B and { ˜ µ i } i ∈S from ˜ Q = { ˜ Q ij } ( i,j ) ∈B . Fig. 2: Sketch of the functions appearing in the optimization algorithm discussed in Section IV.

B. The Surrogate Function and its PropertiesDeﬁnition

21 (

Surrogate function ) . The surrogate function based on ˜ Q is deﬁned to be ψ ˜ Q ( Q ) (cid:44) X ( i,j ) ∈B Q ij · (cid:16) T B ij ( ˜ Q ) − T E ij ( ˜ Q ) (cid:17) − ¯ ψ ˜ Q ( Q ) , (8)where ¯ ψ ˜ Q ( Q ) (cid:44) κ ·  X ( i,j ) ∈B ˜ Q ij · (cid:16) κ · ( δQ ) ij (cid:17) · log (cid:16) κ · ( δQ ) ij (cid:17) − X i ∈S ˜ µ i · (cid:16) κ · ( δµ ) i (cid:17) · log (cid:16) κ · ( δµ ) i (cid:17) . Here, for every ( i, j ) ∈ B , the quantities ( δQ ) ij and ( δµ ) i are deﬁned to be, respectively,( δQ ) ij (cid:44) Q ij − ˜ Q ij ˜ Q ij , ( δµ ) i (cid:44) µ i − ˜ µ i ˜ µ i . Furthermore, the real parameters 0 < κ ≤ κ > ψ ˜ Q ( Q ), and with that the shape of ψ ˜ Q ( Q ). (These parameters can be used to control theaggressiveness of the search step size.) (cid:3) Assumption . In order to show that the surrogate function ψ ˜ Q ( Q ) in Deﬁnition 21 has thepromised properties, we consider a parameterization Q ( θ ) of Q as discussed in Remark 11. Beyond assuming that the parameterization is smooth and that there is a value ˜ θ such that˜ Q = Q (˜ θ ), we make no assumption on this parameterization. (cid:3) In the following, we will use the short-hand notations R s ( θ ) and ψ ˜ Q ( θ ) for R s (cid:16) Q ( θ ) (cid:17) and ψ ˜ Q (cid:16) Q ( θ ) (cid:17) , respectively. Lemma

23 (

Property 1 of the surrogate function ψ ) . The value of ψ ˜ Q ( Q ) matches the valueof R s ( Q ) at Q = ˜ Q , i.e., ψ ˜ Q ( ˜ Q ) = R s ( ˜ Q ) , and, in terms of the parameterization deﬁned above, ψ ˜ Q (˜ θ ) = R s (˜ θ ) . Proof.

We start by noting that Q = ˜ Q implies that ( δQ ) ij = 0 and ( δµ ) i = 0 for all( i, j ) ∈ B , which in turn implies that ¯ ψ ˜ Q ( ˜ Q ) = 0. The result ψ ˜ Q ( ˜ Q ) = R s ( ˜ Q ) follows thenfrom the deﬁnition of ψ ˜ Q ( Q ) in Deﬁnition 21, along with Proposition 19. Lemma

24 (

Property 2 of the surrogate function ψ ) . The gradient of ψ ˜ Q ( Q ) w.r.t. Q matchesthe gradient of R s ( Q ) w.r.t. Q at Q = ˜ Q , i.e.,dd θ ψ ˜ Q ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = dd θ R s ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ , for any parameterization as deﬁned above. Proof.

We start by showing that dd θ ¯ ψ ˜ Q ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = 0 . (9)Indeed,dd θ ¯ ψ ˜ Q ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = κκ ·  X ( i,j ) ∈B Q θij (˜ θ ) · log (cid:16) κ · ( δQ (˜ θ )) ij (cid:17) − X i ∈S µ θi (˜ θ ) · log (cid:16) κ · ( δµ (˜ θ )) i (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = 0 . We then havedd θ ψ ˜ Q ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = dd θ (cid:16) ψ ˜ Q ( θ ) + ¯ ψ ˜ Q ( θ ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = dd θ  X ( i,j ) ∈B Q ij ( θ ) · (cid:16) T B ij (˜ θ ) − T E ij (˜ θ ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = dd θ  X ( i,j ) ∈B Q ij ( θ ) · (cid:16) T B ij ( θ ) − T E ij ( θ ) (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ = dd θ R s ( θ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ =˜ θ , where the ﬁrst equality follows from (9), where the second equality from Deﬁnition 21, wherethe third equality follows from [10, Lemma 64], and where the fourth equality follows fromProposition 19. Remark . Despite the close similarity between the third and fourth expression in the ﬁnaldisplay equation of the above proof, this is a non-trivial result because of the non-trivialityof [10, Lemma 64].

Lemma

26 (

Convexity of the function ¯ ψ ˜ Q ) . The function ¯ ψ ˜ Q ( Q ) is convex over Q ∈ Q ( B ). Proof.

See Appendix C.

Lemma

27 (

Concavity of the surrogate function ψ ˜ Q ) . The surrogate function ψ ˜ Q ( Q ) isconcave over Q ∈ Q ( B ). Proof.

This follows immediately from Lemma 26 and from P ( i,j ) ∈B Q ij · (cid:16) T B ij ( ˜ Q ) − T E ij ( ˜ Q ) (cid:17) being a linear function of Q . C. Maximizing the Surrogate Function

Let ˜ Q denote the FSMS distribution attained at the current iteration of the proposedalgorithm. For the next iteration, ˜ Q is replaced by Q ∗ = n Q ∗ ij o ( i,j ) ∈B , where Q ∗ (cid:44) arg max Q ∈Q ( B ) ψ ˜ Q ( Q ) . (10)In the following, in the same way that we derived { p ij } ( i,j ) ∈B and { µ i } i ∈S from Q = { Q ij } ( i,j ) ∈B , we will derive { ˜ p ij } ( i,j ) ∈B and { ˜ µ i } i ∈S from ˜ Q = { ˜ Q ij } ( i,j ) ∈B and { p ∗ ij } ( i,j ) ∈B and { µ ∗ i } i ∈S from Q ∗ = { Q ∗ ij } ( i,j ) ∈B . Proposition

28 (

The optimum distribution Q ∗ ) . The FSMS Q ∗ = n Q ∗ ij o ( i,j ) ∈B in (10) can befound as follows. • Let A (cid:44) (cid:16) A ij (cid:17) i,j ∈S be the matrix with entries A ij (cid:44)  ˜ p ij · exp ˜ T B ij − ˜ T E ij κκ ! (( i, j ) ∈ B )0 (otherwise) , (11)where ˜ T B ij (cid:44) T B ij ( ˜ Q ) and ˜ T E ij (cid:44) T E ij ( ˜ Q ) are deﬁned according to Deﬁnition 18. Note that A is a non-negative matrix, i.e., a matrix with non-negative entries. • Let ρ be the Perron–Frobenius eigenvalue of the matrix A , with corresponding righteigenvector γ = ( γ j ) j ∈S . • Deﬁne ˆ p ∗ ij (cid:44) A ij ρ · γ j γ i , ( i, j ) ∈ B . (12) • If κ ≥ ˜ Q ij − ˆ Q ∗ ij ˜ Q ij , ( i, j ) ∈ B , (13)then the FSMS Q ∗ is given by solving the system of linear equations  Q ∗ ij − ˆ p ∗ ij P j ∈−→S i Q ∗ ij = − κκ · (cid:16) ˜ µ i ˆ p ∗ ij − ˜ Q ij (cid:17) , ( i, j ) ∈ B , P r ∈←−S i Q ∗ ri − P j ∈−→S i Q ∗ ij = 0 , i ∈ S , P ( i,j ) ∈B Q ∗ ij = 1 . (14)for n Q ∗ ij o ( i,j ) ∈B . Proof.

See Appendix D.

Remark . Increasing the parameters κ and κ has the eﬀect of making the surrogatefunction narrower and steeper, implying a decreased step size. Remark . A procedure similar to the procedure in Remark 20 can be used to eﬃcientlyﬁnd an approximation ˇQ ∗ to Q ∗ : Recall that the Perron–Frobenius eigenvalue of a irreducible non-negative matrix is the eigenvalue with largestabsolute value. One can show that the Perron–Frobenius eigenvalue is a positive real number and that thecorresponding right eigenvector can be multiplied by a suitable scalar such that all entries are positive real numbers. The accuracy of the approximation can be controlled by choosing n suitably large. Algorithm 1

Secure Rate Optimization

Input : n . length of simulated codeword ; B . set of all valid transition probabilities ; Q h i = ( Q h i ij ): Q h i ij ∈ Q ( B ) . initial point satisfying Q h i ij > for all ( i, j ) ∈ B ;PR-WTC (cid:16) g B ( D ) , g E ( D ) , σ , σ (cid:17) ; κ : 0 < κ ≤ . step size controlling parameter ; κ : κ > . step size controlling parameter ; (cid:3) Initialization :set r ← (cid:66) Iteration (until convergence) Apply the procedure in Remark 30 with input Q = Q h r i and output ˇQ (suitably change the parameters κ, κ if necessary); Q h r i ← ˇQ ; r ← r + 1; (cid:67) End Use the procedure in Remark 20 with input Q = Q h r i and output ˇ R s ; return ˇ R s .1) Generate a sequence ˇx n based on the FSMS Q = ˜ Q .2) Simulate Bob’s channel with ˇx n at the input to obtain ˇy n at the output.3) Simulate Eve’s channel with ˇx n at the input to obtain ˇz n at the output.4) For every ( i, j ) ∈ B , compute ˇ˜ T B ij and ˇ˜ T E ij according to (6) and (7) based on Q = ˜ Q .These quantities can be eﬃciently computed with the help of variants of the sum-product / BCJR algorithm. (See [10] for similar observations.)5) Let ˇA (cid:44) (cid:16) ˇ A ij (cid:17) i,j ∈S be the matrix with entriesˇ A ij (cid:44)  ˜ p ij · exp  ˇ˜ T B ij − ˇ˜ T E ij κκ  (( i, j ) ∈ B )0 (otherwise) .

6) Find the Perron–Frobenius eigenvalue ˇ ρ and the corresponding right eigenvector ˇ γ ofthe matrix ˇA .7) Compute ˇˆ p ∗ ij (cid:44) ˇ A ij ˇ ρ · ˇ γ j ˇ γ i , ( i, j ) ∈ B .

8) Solve the system of linear equations  ˇ Q ∗ ij − ˇˆ p ∗ ij P j ∈−→S i ˇ Q ∗ ij = − κκ · (cid:16) ˜ µ i ˇˆ p ∗ ij − ˜ Q ij (cid:17) , ( i, j ) ∈ B , P r ∈←−S i ˇ Q ∗ ri − P j ∈−→S i ˇ Q ∗ ij = 0 , i ∈ S , P ( i,j ) ∈B ˇ Q ∗ ij = 1 . for n ˇ Q ∗ ij o ( i,j ) ∈B .9) Check if ˇ Q ∗ ij ≥

0, ( i, j ) ∈ B . If not, reject the obtained solution, suitably change theparameters κ and κ , and reapply this procedure. The complete algorithm for the proposed optimization method is summarized as Algo-rithm 1.

Remark

31 (

The EM Viewpoint ) . The proposed algorithm can be considered as a variationof the well-known EM algorithm [26] comprised of two steps: Expectation (E-step) andMaximization (M-step). Namely, identifying a concave surrogate function of the securerate function around a local operating point resembles the E-step and maximization of thesurrogate function to achieve a higher secure rate corresponds to the M-step. With this, theproposed algorithm has similar convergence guarantees (to a local maximum of the securerate function) as the EM algorithm [27].

V. Simulation Results and Discussion

In this section we apply the proposed algorithm, Algorithm 1, to two diﬀerent PR-WTCsand study the obtained achievable secure rates.In previous sections, in order to keep the notation simple, we used the channel inputalphabet X = { +1 , − } and unnormalized PR channel transfer polynomials. However, in thissection we use the channel input alphabet X = { + √ E s , −√ E s } and normalized PR channeltransfer polynomials, where a normalized transfer polynomial g ( D ) (cid:44) P mt =0 g t D t ∈ R [ D ] hasto satisfy P mt =0 | g t | = 1. (For a discussion on normalization of transfer polynomials, see,e.g., [28].) Alternatively, one could determine { ˇˆ Q ∗ ij } ( i,j ) ∈B based on ˇˆ p ∗ ij and then verify if κ ≥ (cid:0) ˜ Q ij − ˇˆ Q ∗ ij (cid:1) / ˜ Q ij , ( i, j ) ∈ B .However, the proposed checks are obviously simpler to evaluate. i.u.d. secure information rate 3-rd order constrained secrecy capacityi.u.d. secure information rate 3-rd order constrained secrecy capacity Fig. 3: Simulation results for the setup in Example 32. ( a ) The 3rd order constrained secrecy capacity andthe secure rates of an i.u.d. input process when SNR EdB = 5 . b ) Normalized histogram functions of thelocally-optimum secure rates obtained from running Algorithm 1 with 100 diﬀerent initializations. i.u.d. secure information rate 3-rd order constrained secrecy capacityi.u.d. secure information rate 3-rd order constrained secrecy capacity Fig. 4: Simulation results for the setup in Example 33. ( a ) The 3rd order constrained secrecy capacity andthe secure rates of an i.u.d. input process when SNR EdB = 8 . b ) Normalized histogram functions of thelocally-optimum secure rates obtained from running Algorithm 1 with 100 diﬀerent initializations. A. Simulation Results

In general, we consider the following PR-WTC setup: • The transmitted symbols are BPSK modulated with the alphabet X = { + √ E s , −√ E s } . • Bob’s channel is a PR channel with normalized transfer polynomial g B ( D ) and additivewhite Gaussian noise of variance σ . • Eve’s channel is a PR channel with normalized transfer polynomial g E ( D ) and additivewhite Gaussian noise of variance σ . • The SNR of Bob’s and Eve’s channel is deﬁned as SNR B (cid:44) E s /σ and SNR E (cid:44) E s /σ ,which in terms of decibels are SNR BdB (cid:44)

10 log (cid:16) E s /σ (cid:17) and SNR EdB (cid:44)

10 log (cid:16) E s /σ (cid:17) , Bandwidth (Hz)

EPR4 (---------= 9.0 dB)EPR4 (---------= 8.5 dB)EPR4 (---------= 8.0 dB)EPR4 (---------= 7.5 dB)DICODE (---------= 8.0 dB) . Bandwidth (Hz)

EPR4 (---------= 5.0 dB)DICODE (---------= 5.0 dB)DICODE (---------= 4.5 dB)DICODE (---------= 4.0 dB)DICODE (---------= 3.5 dB)DICODE (---------= 3.0 dB)DICODE (---------= 2.5 dB)

Fig. 5: Capacity of the dicode and the EPR4channels in nats / sec with input power E s = 1 J,for the SNR values corresponding to Example 32. Bandwidth (Hz)

EPR4 (---------= 9.0 dB)EPR4 (---------= 8.5 dB)EPR4 (---------= 8.0 dB)EPR4 (---------= 7.5 dB)DICODE (---------= 8.0 dB) . Bandwidth (Hz)

EPR4 (---------= 5.0 dB)DICODE (---------= 5.0 dB)DICODE (---------= 4.5 dB)DICODE (---------= 4.0 dB)DICODE (---------= 3.5 dB)DICODE (---------= 3.0 dB)DICODE (---------= 2.5 dB)

Fig. 6: Capacity of the dicode and the EPR4channels in nats / sec with input power E s = 1 J,for the SNR values corresponding to Example 33. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-120-100-80-60-40-20020 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1-120-100-80-60-40-20020 Fig. 7: The channel’s gain-to-noise spectrum ratio in decibels per frequency (Hz) corresponding to Examples32 and 33. respectively. Example . We consider the following setup: • An FSMS as in Remark 2 with ¯ ν = 3. • Bob’s channel is a dicode channel, i.e., g B ( D ) = 1 √ · (1 − D ) . • Eve’s channel is an EPR4 channel, i.e., g E ( D ) = 12 · (cid:16) D − D − D (cid:17) . If desired, these SNR values can be re-expressed in terms of E s /N values, where N is the two-sided power spectraldensity of the AWGN process: E s /N = · ( E s /σ ). Fig. 3( a ) shows the obtained secure information rates: on the one hand for an unoptimized FSMS, i.e., an FSMS producing i.u.d. symbols, and, on the other hand, for an optimized

FSMS, where the optimization was done with the help of Algorithm 1. In this plot, for everySNR

BdB value, the best obtained secure information rate is plotted after running Algorithm 1for 100 diﬀerent initializations. In fact, it is important to run Algorithm 1 with severalinitializations because the secure information rate is a highly ﬂuctuating function, which iswitnessed by the broad histograms in Fig. 3( b ) that show the obtained secure informationrates for various initializations of Algorithm 1. Example . We consider the following setup: • An FSMS as in Remark 2 with ¯ ν = 3. • Bob’s channel is an EPR4 channel, i.e., g B ( D ) = 12 · (cid:16) D − D − D (cid:17) . • Eve’s channel is a dicode channel, i.e., g E ( D ) = 1 √ · (1 − D ) . The obtained results are shown in Figs. 4( a ) and ( b ). (These ﬁgures are similar to Figs. 3( a )and ( b ) for Example 32.)Note that when running Algorithm 1, we used κ and κ values in the ranges 0 . ≤ κ ≤ ≤ κ ≤

10, respectively. Typically, 30 to 40 iterations were needed to reach numericalconvergence.

B. Discussion

As we have seen in Section V-A, positive secure information rates are possible andoptimizing the FSMS clearly beneﬁts these rates. Interestingly, positive secure informationrates are also possible when the point-to-point channel to Eve is “better” than the point-to-point channel to Bob. In order to make this discussion more insightful and analyticallytractable, we consider in this section the scenario where the only channel input constraint is These initializations were generated with the help of Weyl’s |S| -dimensional equi-distributed sequences [29]. an average energy constraint. This allows us to use Fourier transform techniques and well-known “water-pouring” formulas for analyzing capacities of such point-to-point channels.Namely, consider a PR channel described by the transfer polynomial g ( D ) = P mt =0 g t D t and with additive (possibly non-white) Gaussian noise. The unconstrained (besides someaverage-energy constraint) capacity of this channel is given by the “water-pouring” formula(see, e.g., [28]) C = 12 · ∞ Z −∞ max ( , log αN ( f ) / | G ( f ) | !) d f, where G ( f ) =  P m‘ =0 g ‘ e − j ‘πfT P m‘ =0 | g ‘ | (if | f | ≤ W )0 (otherwise) , and where α > E s = ∞ Z −∞ max ( , α − N ( f ) | G ( f ) | ) d f. This capacity formula is based on the following assumptions: • average energy constraint per input symbol E s ; • symbol period T (in seconds); • a perfect lowpass ﬁlter of bandwidth W (cid:44) T and sampling at the Nyquist frequency1 /T at the receiver side; • power spectral density N ( f ) (in Watts per Hertz) of the additive Gaussian noise beforethe lowpass ﬁlter.The resulting unconstrained capacities for a dicode and an EPR4 channel are shown inFig. 5 and Fig. 6. Example

34 (Continuation of Example 32) . Consider the scenario where E s = 1 J, 2 . ≤ SNR

BdB ≤ EdB = 5 dB. It can be seen from Fig. 5 that Eve’s channel has ahigher unconstrained capacity than Bob’s channel for suﬃciently large enough bandwidth.In this sense, Bob’s channel is “worse” than Eve’s channel. However, luckily for Bob, thereare frequencies where Bob’s channel has a better gain-to-noise spectrum ratio than Eve’s channel, as can be seen from Fig. 7( a ). This can be exploited by a suitably tuned source atthe channel input toward obtaining positive secure information rates. Example

35 (Continuation of Example 33) . Consider the scenario where E s = 1 J, 7 . ≤ SNR

BdB ≤ EdB = 8 dB. It can be seen from Fig. 6 that Bob’s channel has ahigher unconstrained capacity than Eve’s channel for suﬃciently large enough bandwidth.In this sense, it is not unexpected that positive secure information rates are possible.Nevertheless, it is worthwhile to point out that here secure information rates are possibleeven though Bob’s channel has larger memory and, for some selections of SNR

BdB , highernoise power than Eve’s channel (see Fig. 7( b )).In a conventional DM-WTC [15], [16], Eve’s channel necessarily has to be noisier thanBob’s channel in order to achieve a positive secrecy capacity. This results in the capacityof Eve’s channel to be less than the capacity of Bob’s channel. In contrast, we showed thatpositive secure information rates are achievable on the PR-WTCs, even if • the unconstrained capacity of Bob’s channel is smaller than the unconstrained capacityof Eve’s channel (Example 32); • Bob’s channel tolerates both a higher noise power and a larger memory compared withEve’s channel (Example 33).

VI. Conclusion

In this paper, we have optimized an FSMS at the input of a PR-WTC toward (locally)maximizing the secrecy rate. Because directly maximizing the secrecy rate function is chal-lenging, we have iteratively approximated the secrecy rate function by a surrogate functionwhose maximum can be found eﬃciently.Our numerical results show that, by implicitly using the discrepancies between the fre-quency responses of Bob’s and Eve’s channels, it is possible to achieve positive secrecyrates also for setups where the unconstrained capacity of Eve’s channel is larger than theunconstrained capacity of Bob’s channel. Appendix AProof of Proposition 14

Let n X k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 o + ∞ k = −∞ be a block i.i.d. process where each block has length ‘ + 2 ν . Itsuﬃces to specify the distribution of a single block X k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 . We set X k ( ‘ +2 ν )+( ‘ +1) (cid:44) , . . . , X k ( ‘ +2 ν )+( ‘ + ν ) (cid:44) , in order to ensure that there is no interference across blocks, while allowing X k ( ‘ +2 ν )+ ‘k ( ‘ +2 ν ) − ν +1 tobe arbitrarily distributed. It is easy to verify that (cid:26) X k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 , Y k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 (cid:27) + ∞ k = −∞ , is a joint block i.i.d. process. Similarly, (cid:26) X k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 , Z k ( ‘ +2 ν )+( ‘ + ν ) k ( ‘ +2 ν ) − ν +1 (cid:27) + ∞ k = −∞ , is also a joint block i.i.d. process. It follows by the strong law of large numbers thatlim n →∞ n i ( X n ; Y n ) = 1 ‘ + 2 ν I ( X ‘ + ν − ν +1 ; Y ‘ + ν − ν +1 ) w.p. 1 , lim n →∞ n i ( X n ; Z n ) = 1 ‘ + 2 ν I ( X ‘ + ν − ν +1 ; Z ‘ + ν − ν +1 ) w.p. 1 . Note that I ( X ‘ + ν − ν +1 ; Y ‘ + ν − ν +1 ) ≥ I ( X ‘ − ν +1 ; Y ‘ )= I ( X − ν +1 ; Y ‘ ) + I ( X ‘ ; Y ‘ | X − ν +1 ) ≥ I ( X ‘ ; Y ‘ | X − ν +1 ) . Moreover, I ( X ‘ + ν − ν +1 ; Z ‘ + ν − ν +1 ) = I ( X ‘ − ν +1 ; Z ‘ + ν − ν +1 )= I ( X ‘ − ν +1 ; Z ‘ ) + I ( X ‘ − ν +1 ; Z − ν +1 | Z ‘ ) + I ( X ‘ − ν +1 ; Z ‘ + ν‘ +1 | Z ‘ − ν +1 )= I ( X ‘ ; Z ‘ | X − ν +1 ) + I ( X − ν +1 ; Z ‘ ) + I ( X ‘ − ν +1 ; Z − ν +1 | Z ‘ ) + I ( X ‘ − ν +1 ; Z ‘ + ν‘ +1 | Z ‘ − ν +1 )= I ( X ‘ ; Z ‘ | X − ν +1 ) + I ( X − ν +1 ; Z ‘ ) + I ( X − ν +1 ; Z − ν +1 | Z ‘ ) + I ( X ‘‘ − ν +1 ; Z ‘ + ν‘ +1 | Z ‘ − ν +1 ) ≤ I ( X ‘ ; Z ‘ | X − ν +1 ) + 3 ν log |X | . Combining Lemma 13 with the above lower and upper bounds concludes the proof.

Appendix BProof of Proposition 19

Besides the assumptions on the parameterizations Q ( θ ) made in Assumption 22, we willalso assume that for all ( i, j ) ∈ B , the functions Q ij ( θ ) and µ i ( θ ) are aﬃne functions of θ ,which implies that Q θθij ( θ ) = 0 , µ θθi ( θ ) = 0 , (15)where the superscript θθ denotes the second-order derivative w.r.t. θ .Denoting the second-order derivative of ¯ ψ ˜ Q ( θ ) by ¯ ψ θθ ˜ Q ( θ ), we observe that the claim in thelemma statement is equivalent to ¯ ψ θθ ˜ Q ( θ ) ≥ Q ( θ ) thatsatisfy the above-mentioned conditions.Some straightforward calculations show that¯ ψ θθ ˜ Q ( θ ) = κ κ ·  X ( i,j ) ∈B ( Q θij ) Q ij − X i ∈S ( µ θi ) µ i  = κ κ · X i ∈S  X j ∈−→S i ( Q θij ) Q ij  − ( µ θi ) µ i  . Noting that for any i ∈ S it holds that X j ∈−→S i ( Q θij ) Q ij = µ i · X j ∈−→S i Q ij µ i · Q θij Q ij ! ≥ µ i ·  X j ∈−→S i Q ij µ i · Q θij Q ij  = 1 µ i ·  X j ∈−→S i Q θij  = ( µ θi ) µ i , where the inequality follows from Jensen’s inequality. Combining the above two displayequations, we can conclude that, indeed, ¯ ψ θθ ˜ Q ( θ ) ≥ Appendix DProof of Proposition 28

Maximizing ψ ˜ Q ( Q ) over Q ∈ Q ( B ) means to optimize a diﬀerentiable, concave functionover a polytope. We therefore set up the Lagrangian L (cid:44) X ( i,j ) ∈B Q ij · (cid:16) ˜ T B ij − ˜ T E ij (cid:17) − ¯ ψ ˜ Q ( Q ) + λ ·  X ( i,j ) ∈B Q ij −  + X ( i,j ) ∈B λ j Q ij − X ( i,j ) ∈B λ i Q ij . Note that at this stage we omit Lagrangian multipliers w.r.t. the constraints Q ij ≥ i, j ) ∈ B . We will make sure at a later stage that these constraints are satisﬁed thanks tothe choice of κ in (13). Recall that we assume that the surrogate function takes on its maximal value at Q = Q ∗ .Therefore, setting the gradient of L equal to the zero vector at Q = Q ∗ , we obtain0 = ∂L∂Q ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ = ˜ T B ij − ˜ T E ij − ∂ ¯ ψ ˜ Q ( Q ) ∂Q ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ + λ ∗ + λ ∗ j − λ ∗ i , ( i, j ) ∈ B , (16)0 = ∂L∂λ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ = X ( i,j ) ∈B Q ∗ ij − , ∂L∂λ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ = X r ∈←−S i Q ∗ ri − X j ∈−→S i Q ∗ ij , i ∈ S , where ∂ ¯ ψ ˜ Q ( Q ) ∂Q ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ = κ · (cid:18) κ · log (cid:16) κ · ( δQ ) ij (cid:17) − κ · log (cid:16) κ · ( δµ ) i (cid:17)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) Q = Q ∗ = κ · κ · log (1 − κ ) · ˜ Q ij + κ · Q ∗ ij (1 − κ ) · ˜ µ i + κ · µ ∗ i · ˜ µ i ˜ Q ij ! = κ · κ · log  ˆ Q ∗ ij ˆ µ ∗ i · ˜ µ i ˜ Q ij  = κ · κ · log(ˆ p ∗ ij ) − κ · κ · log(˜ p ij ) . (17)Here the third and fourth equality use { ˆ Q ∗ ij } ( i,j ) ∈B , which is deﬁned byˆ Q ∗ ij (cid:44) (1 − κ ) · ˜ Q ij + κ · Q ∗ ij , ( i, j ) ∈ B , (18)along with { ˆ µ ∗ i } i ∈S and { ˆ p ∗ ij } ( i,j ) ∈B , which are derived from { ˆ Q ∗ ij } ( i,j ) ∈B in the usual manner.Note that ˆ µ ∗ i = (1 − κ )˜ µ i + κµ ∗ i , i ∈ S , ˆ p ∗ ij = ˆ Q ∗ ij ˆ µ ∗ i = (1 − κ ) · ˜ Q ij + κ · Q ∗ ij (1 − κ ) · ˜ µ i + κ · µ ∗ i = (1 − κ ) · ˜ Q ij + κ · Q ∗ ij (1 − κ ) · ˜ µ i + κ · P j ∈−→S i Q ∗ ij , ( i, j ) ∈ B . (19)Note also that solving (18) for Q ∗ ij results in Q ∗ ij = 1 κ · ( ˆ Q ∗ ij − ˜ Q ij + κ · ˜ Q ij ) , ( i, j ) ∈ B , which shows that Q ∗ ij ≥

0, ( i, j ) ∈ B , for κ satisfying (13). (Recall that when setting upthe Lagrangian, we omitted the Lagrange multipliers for the constraints Q ij ≥

0, ( i, j ) ∈ B ;therefore we have to verify that the solution satisﬁes these constraints, which it does indeed.) Combining (16) and (17) and solving for ˆ p ∗ ij results inˆ p ∗ ij = ˜ p ij · exp ˜ T B ij − ˜ T E ij + λ ∗ + λ ∗ j − λ ∗ i κκ ! , ( i, j ) ∈ B . Using (11) and deﬁning ρ (cid:44) exp (cid:16) − λ ∗ κκ (cid:17) and γ = ( γ i ) i ∈S , where γ i (cid:44) exp (cid:16) λ ∗ i κκ (cid:17) , allows therewriting of this equation as ˆ p ∗ ij = A ij ρ · γ j γ i , ( i, j ) ∈ B . Because P j ∈−→S i ˆ p ∗ ij = 1 for all i ∈ S , summing both sides of this equation over j ∈ −→S i resultsin 1 = X j ∈−→S i A ij ρ · γ j γ i , i ∈ S , or, equivalently, ρ · γ i = X j ∈−→S i A ij · γ j , i ∈ S . This system of linear equations can be written as A · γ = ρ · γ . Clearly, this equation can only be satisﬁed if γ is an eigenvector of A with correspondingeigenvalue ρ . A slightly lengthy calculation (which is somewhat similar to the calculationin [10, Eq. (51)]) shows that ψ ˜ Q ( Q ∗ ) = log( ρ ) . (20)As is well known, Perron–Frobenius theory guarantees for an irreducible non-negative matrixthat the eigenvalue with largest absolute value is a positive real number, called the Perron–Frobenius eigenvalue. Therefore, in order to maximize the right-hand side of (20) over alleigenvalues of A , the eigenvalue ρ has to be the Perron–Frobenius eigenvalue and γ thecorresponding eigenvector.The proof is concluded by noting that (19) can be rewritten as the system of linear equations Q ∗ ij − ˆ p ∗ ij · X j ∈−→S i Q ∗ ij = 1 − κκ · (cid:16) ˜ µ i ˆ p ∗ ij − ˜ Q ij (cid:17) , ( i, j ) ∈ B , which can be used to determine { Q ∗ ij } ( i,j ) ∈B , because all other quantities appearing in theseequations are either known or have already been calculated. References [1] E. M. Kurtas,

Advanced Error Control Techniques for Data Storage Systems . Boca Raton, FL, USA: CRCPress, 2005.[2] B. Vasić and E. M. Kurtas,

Coding and Signal Processing for Magnetic Recording Systems . Abingdon: CRCPress, 2004.[3] K. E. S. Immink, P. H. Siegel, and J. K. Wolf, “Codes for digital recorders,”

IEEE Trans. Inf. Theory , vol. 44,no. 6, pp. 2260–2299, Oct. 1998.[4] G. G. Raleigh and J. M. Cioﬃ, “Spatio-temporal coding for wireless communication,”

IEEE Trans. Commun. ,vol. 46, no. 3, pp. 357–366, Mar. 1998.[5] K. J. P. Golden, H. Dedieu,

Fundamentals of DSL Technology . Boca Raton, FL, USA: CRC Press, 2005.[6] R. G. Gallager,

Information Theory and Reliable Communication . New York, NY, USA: John Wiley & Sons,1968.[7] A. Kavčić, X. Ma, and M. Mitzenmacher, “Binary intersymbol interference channels: Gallager codes, densityevolution, and code performance bounds,”

IEEE Trans. Inf. Theory , vol. 49, no. 7, pp. 1636–1652, Jul. 2003.[8] R. Blahut, “Computation of channel capacity and rate-distortion functions,”

IEEE Trans. Inf. Theory , vol. 18,no. 4, pp. 460–473, Jul. 1972.[9] S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,”

IEEE Trans.Inf. Theory , vol. 18, no. 1, pp. 14–20, Jan. 1972.[10] P. O. Vontobel, A. Kavčić, D. M. Arnold, and H.-A. Loeliger, “A generalization of the Blahut-Arimoto algorithmto ﬁnite-state channels,”

IEEE Trans. Inf. Theory , vol. 54, no. 5, pp. 1887–1918, May 2008.[11] A. Kavčić, “On the capacity of Markov sources over noisy channels,” in

Proc. IEEE Global CommunicationsConference , vol. 5, San Antonio, TX, USA, Nov. 2001, pp. 2997–3001.[12] S. Yang, A. Kavčić, and S. Tatikonda, “Feedback capacity of ﬁnite-state machine channels,”

IEEE Trans. Inf.Theory , vol. 51, no. 3, pp. 799–810, Mar. 2005.[13] P. O. Vontobel and D. M. Arnold, “An upper bound on the capacity of channels with memory and constraintinput,” in

Proc. IEEE Information Theory Workshop , Cairns, Queensland, Australia, Sep. 2001, pp. 147–149. [14] J. Chen and P. H. Siegel, “Markov processes asymptotically achieve the capacity of ﬁnite-state intersymbolinterference channels,” IEEE Trans. Inf. Theory , vol. 54, no. 3, pp. 1295–1303, Mar. 2008.[15] A. D. Wyner, “The wire-tap channel,”

The Bell System Technical Journal , vol. 54, no. 8, pp. 1355–1387, Oct.1975.[16] I. Csiszár and J. Körner, “Broadcast channels with conﬁdential messages,”

IEEE Trans. Inf. Theory , vol. 24,no. 3, pp. 339–348, May 1978.[17] M. Bloch and J. N. Laneman, “On the secrecy capacity of arbitrary wiretap channels,” in

Proc. 46th AnnualAllerton Conf. Commun. Control and Computing , Monticello, IL, USA, Sep. 2008, pp. 818–825.[18] M. R. Bloch and J. N. Laneman, “Strong secrecy from channel resolvability,”

IEEE Trans. Inf. Theory , vol. 59,no. 12, pp. 8077–8098, Dec. 2013.[19] T. Han,

Information-Spectrum Methods in Information Theory . Berlin, Heidelberg: Springer, 2003.[20] Y. Sankarasubramaniam, A. Thangaraj, and K. Viswanathan, “Finite-state wiretap channels: Secrecy undermemory constraints,” in

Proc. IEEE Information Theory Workshop , Taormina, Italy, Oct. 2009, pp. 115–119.[21] B. Dai, Z. Ma, and Y. Luo, “Finite state Markov wiretap channel with delayed feedback,”

IEEE Trans. Inf.Forensics Secur. , vol. 12, no. 3, pp. 746–760, Mar. 2017.[22] H. Zhang, L. Yu, C. Wei, and B. Dai, “A new feedback scheme for the state-dependent wiretap channel withnoncausal state at the transmitter,”

IEEE Access , vol. 7, pp. 45 594–45 604, Apr. 2019.[23] S. Hanoglu, S. R. Aghdam, and T. M. Duman, “Artiﬁcial-noise-aided secure transmission over ﬁnite-inputintersymbol interference channels,” in

Proc. 25th Int. Conf. Telecommun. , St. Malo, France, Jun. 2018, pp.346–350.[24] D. M. Arnold, H.-A. Loeliger, P. O. Vontobel, A. Kavčić, and W. Zeng, “Simulation-based computation ofinformation rates for channels with memory,”

IEEE Trans. Inf. Theory , vol. 52, no. 8, pp. 3498–3508, Aug.2006.[25] P. Sadeghi, P. O. Vontobel, and R. Shams, “Optimization of information rate upper and lower bounds forchannels with memory,”

IEEE Trans. Inf. Theory , vol. 55, no. 2, pp. 663–688, Feb. 2009.[26] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,”

Journal of the Royal Statistical Society. Series B (Methodological) , pp. 1–38, 1977.[27] C. F. J. Wu, “On the convergence properties of the EM algorithm,”

The Annals of Statistics , vol. 11, no. 1, pp.95–103, 1983.[28] W. Xiang and S. Pietrobon, “On the capacity and normalization of ISI channels,”

IEEE Trans. Inf. Theory ,vol. 49, no. 9, pp. 2263–2268, Sep. 2003.[29] K. L. Judd,