[PDF] Covert Identification over Binary-Input Discrete Memoryless Channels

Abstract

This paper considers the covert identification problem in which a sender aims to reliably convey an identification (ID) message to a set of receivers via a binary-input discrete memoryless channel (BDMC), and simultaneously to guarantee that the communication is covert with respect to a warden who monitors the communication via another independent BDMC. We prove a square-root law for the covert identification problem. This states that an ID message of size \exp(\exp(\Theta(\sqrt{n}))) can be transmitted over n channel uses. We then characterize the exact pre-constant in the \Theta(.) notation. This constant is referred to as the covert identification capacity. We show that it equals the recently developed covert capacity in the standard covert communication problem, and somewhat surprisingly, the covert identification capacity can be achieved without any shared key between the sender and receivers. The achievability proof relies on a random coding argument with pulse-position modulation (PPM), coupled with a second stage which performs code refinements. The converse proof relies on an expurgation argument as well as results for channel resolvability with stringent input constraints.

Full PDF

aa r X i v : . [ c s . I T ] J u l Covert Identiﬁcation overBinary-Input Memoryless Channels

Qiaosheng Zhang, Vincent Y. F. Tan,

Senior Member, IEEE

Abstract

This paper considers the covert identiﬁcation problem in which a sender aims to reliably convey an identiﬁcation (ID) messageto a set of receivers via a binary-input memoryless channel (BMC), and simultaneously to guarantee that the communication iscovert with respect to a warden who monitors the communication via another independent BMC. We prove a square-root law forthe covert identiﬁcation problem. This states that an ID message of size exp(exp(Θ( √ n ))) can be transmitted over n channel uses.We then characterize the exact pre-constant in the Θ( · ) notation. This constant is referred to as the covert identiﬁcation capacity.We show that it equals the recently developed covert capacity in the standard covert communication problem, and somewhatsurprisingly, the covert identiﬁcation capacity can be achieved without any shared key between the sender and receivers. Theachievability proof relies on a random coding argument with pulse-position modulation (PPM), coupled with a second stage whichperforms code reﬁnements. The converse proof relies on an expurgation argument as well as results for channel resolvability withstringent input constraints. Index Terms

Covert Communication, Identiﬁcation via channels, Channel resolvability.

I. I

NTRODUCTION

In contrast to Shannon’s classical channel coding problem [1] (also known as the transmission problem ) in which a senderwishes to reliably send a message to a receiver through a noisy channel W , the problem of identiﬁcation via channels [2](or simply the identiﬁcation problem ) is rather different. It focuses on a different setting wherein a sender wishes to send an identiﬁcation (ID) message m ∈ M via a noisy channel W to a set of receivers { R m ′ } m ′ ∈M , each observing the (same)outputs of the channel, such that every receiver R m ′ only cares about its dedicated message m ′ and should be able to reliablyanswer the following question: Is the ID message sent by the sender m ′ ? Speciﬁcally, if the ID message sent by the sender is m , • The receiver R m ′ should answer “YES” with high probability if m ′ = m ; • The receiver R m ′ should answer “NO” with high probability if m ′ = m .It is well known that in the transmission problem, one can reliably transmit a message of size exp(Θ( n )) over n channel uses,and the pre-constant is characterized by the celebrated channel capacity C W , max P I ( P, W ) , i.e., the mutual informationof the input and output of the channel W maximized over the input distribution P . In the identiﬁcation problem, Ahlswedeand Dueck [2] showed that the size of the ID message can be as large as exp(exp(Θ( n ))) , i.e., doubly-exponentially large inthe blocklength n . Somewhat surprisingly, the exact pre-constant in the Θ( · ) notation, which is referred to as the identiﬁcationcapacity , is again C W [2], [3]. That is, the identiﬁcation capacity exactly equals the channel capacity.Apart from reliability guarantees, recent years have witnessed an increasing attention to security concerns, especially innetworked communication systems such as the Internet of Things. From an information-theoretic perspective, the security ofthe classical transmission problem has been extensively studied since Wyner’s seminal paper [4] on the wiretap channel (see [5],[6] for surveys), and the secure identiﬁcation problem has been investigated as well [7]–[9]. While most security problems areconcerned with hiding the content of information, in certain scenarios merely the fact that communication takes place couldlead to serious consequences—thus, the sender is required to hide the fact that he/she is communicating when he/she does so.Said differently, the sender needs to communicate covertly with respect to the warden who is surreptitiously monitoring thecommunication. This motivates the recent studies of the covert communication problem. Following the pioneering work byBash et al. [10] which demonstrates a square-root law (SRL) (i.e., one can only transmit Θ( √ n ) bits over n channel uses) forcovert communication, subsequent works have built on the initial work [10] to establish information-theoretic limits for covertcommunication over binary symmetric channels [11]–[13], discrete memoryless channels (DMCs) and Gaussian channels [14]–[17], multiple-access channels [18], broadcast channels [19]–[21], compound channels [22], channel with states [23], [24],adversarial noise channels [25], relay channels [26], etc . In the literature, the covertness constraint requires that at the warden’sside, the output distribution when communication takes place is almost indistinguishable from the output distribution when nocommunication takes place, and the discrepancy between the two distributions is usually measured by the Kullback-Leibler(KL) divergence or the variational distance . Qiaosheng Zhang is with the Department of Electrical and Computer Engineering, National University of Singapore (e-mail: [email protected]).Vincent Y. F. Tan is with the Department of Electrical and Computer Engineering and Department of Mathematics, National University of Singapore (e-mail:[email protected]).

In addition to covert communication which focuses on the transmission problem, there are also scenarios in which the senderwishes to reliably send an ID message to a set of receivers, and simultaneously to remain covert with respect to the warden.For instance, a commander would like to send a certain message to a set of subordinates M . This message informs exactlyone of them to be prepared to strike. Each of these subordinates m ′ ∈ M would like to know whether the message sent bythe commander m ∈ M corresponds to his speciﬁc index m ′ ; if so, he must be prepared, otherwise nothing needs to be doneon his part. Each subordinate is only interested in whether or not he should be ready. The commander has to send his messagein such a way that an enemy should not be able to infer that any communication is occurring. We refer to this problem asthe covert identiﬁcation problem . Given the similarities and differences between the transmission and identiﬁcation problemswithout covertness constraints, it is then natural to ask the following questions: (i) What is the maximum size of the ID messagewith covertness constraints , (ii)

Whether the covert capacity characterized in [14], [15] plays a role in the fundamental limitsof the covert identiﬁcation problem , and (iii)

Is a shared key required to ensure that the identiﬁcation can take place reliably?

These questions precisely set the stage of this work, and our main contributions can be summarized as follows. • Analogous to the SRL in the covert communication literature, a different form of the SRL is discovered in the covertidentiﬁcation problem. That is, one can send an ID message of size up to exp(exp(Θ( √ n ))) reliably and covertly, incontrast to the standard identiﬁcation problem wherein the scaling is exp(exp(Θ( n ))) . • We then characterize the maximal pre-constant of the Θ( · ) notation in exp(exp(Θ( √ n ))) , which is referred to as the covert identiﬁcation capacity . We do so by establishing matching achievability and converse results. It turns out that thecovert identiﬁcation capacity equals the covert capacity; however, a key difference is that the former is achieved withoutany shared key between the sender and receivers—this is in stark contrast to standard covert communication wherein theshared key is necessary [14] for achieving covert capacity in some regimes of the channel between the sender and receiverand the channel between the sender and the warden.From the achievability’s perspective, the requirement of a keyless identiﬁcation code prevents us from adopting the simplestand most classical construction of identiﬁcation codes proposed by Ahlswede and Dueck [2], which relies on the existenceof a capacity-achieving code for the transmission problem. This is because there does not exist a keyless covert-capacity-achieving transmission code for covert communication in general [14]. Therefore, we develop an identiﬁcation code from ﬁrstprinciples. Our construction is based on a random coding argument with Pulse-Position Modulation (PPM) and a modiﬁedinformation density decoder. PPM, which can be viewed as a special sub-class of constant composition codes, was shownto be optimal for covert communication by Bloch and Guha [27]. PPM codes are also useful in this work. In addition, wehighlight that the random coding argument does not directly ensure the existence of a good identiﬁcation code with vanishing maximum error probabilities, due to the large message size which is of order exp(exp(Θ( √ n ))) ; this issue is resolved bya careful code reﬁnement process, which is explained in Section IV-D. Our code reﬁnement process is different from theconventional expurgation argument that is ubiquitous in the information theory literature, in the sense that our reﬁnementprocedure preserves the channel output distribution induced by the original code; this is critical for ensuring that the covertnessconstraint is satisﬁed.The proof of the converse part for the covert identiﬁcation problem is also non-standard. Roughly speaking, the converse forchannel identiﬁcation usually relies on the achievability for channel resolvability for general input distributions, as discoveredin Han and Verdú’s seminal work [3]. However, such general results have not been established under stringent input constraintsimposed by the covertness constraint. Instead, we circumvent this difﬁculty by expurgating a large number of codevectors ofthe original covert identiﬁcation code such that the resultant expurgated code satisﬁes certain cost constraints, and one canthen apply the idea of [3] to the new code to obtain the desired converse result. It is also worth noting that the expurgationargument used in this work differs from some relevant works on covert communication [16], [25], [28], since the identiﬁcationproblem relies critically on the use of stochastic encoders (as detailed in Section III).The rest of this paper is organized as follows. We provide some notational conventions and an important technical lemma inSection II. In Section III, we formally introduce the covert identiﬁcation problem and also present the main results. Sections IVand V respectively provide the detailed proofs of the achievability and converse parts for the main results. In Section VI, weconclude this work and propose several promising directions for future work.II. P RELIMINARIES

For non-negative integers a, b ∈ N , we use [ a : b ] to denote the set of integers { a, a + 1 , . . . , b } . Random variables and theirrealizations are respectively denoted by uppercase and lowercase letters, e.g., X and x . Sets are denoted by calligraphic letters,e.g., X . Vectors of length n are denoted by boldface letters, e.g., X or x , while vectors of shorter length (which should beclear from the context) are denoted by underlined boldface letters, e.g., X or x . We use X i or x i to denote the i -th elementof a vector, and X ba or x ba to denote the vector ( X a , X a +1 , . . . , X b ) or ( x a , x a +1 , . . . , x b ) . Throughout this paper, logarithms log and exponentials exp are based e . For two probability distributions P and Q over thesame ﬁnite set X , we respectively deﬁne their KL-divergence , variational distance , and χ -distance as D ( P k Q ) , X x ∈X P ( x ) log P ( x ) Q ( x ) , V ( P, Q ) , X x ∈X | P ( x ) − Q ( x ) | ,χ ( P k Q ) , X x ∈X ( P ( x ) − Q ( x )) Q ( x ) . We say P is absolutely continuous with respect to Q (denoted by P ≪ Q ) if the support of P is a subset of the support of Q (i.e., for all x ∈ X , P ( x ) = 0 if Q ( x ) = 0 ).Moreover, we introduce a concentration inequality that is widely used in this work—Hoeffding’s inequality. Lemma 1 (Hoeffding’s inequality [29]) . Suppose { X i } ni =1 is a set of independent random variables such that a i ≤ X i ≤ b i almost surely, and let X , P ni =1 X i . For any v > , P ( | X − E ( X ) | ≥ v ) ≤ exp (cid:18) − v P ni =1 ( b i − a i ) (cid:19) . III. P

ROBLEM S ETTING AND M AIN R ESULTS

The channel between the sender and receivers is a binary-input memoryless channel (BMC) ( X , W Y | X , Y ) , and the channelbetween the sender and warden is another independent BMC ( X , W Z | X , Z ) . It is assumed that Y and Z are ﬁnite alphabets, and X = { , } with ‘ ’ being the innocent symbol and ‘ ’ being the symbol that carries information. The channel transition prob-ability corresponding to n channel uses are denoted by W ⊗ nY | X ( y | x ) , Q ni =1 W Y | X ( y i | x i ) and W ⊗ nZ | X ( z | x ) , Q ni =1 W Z | X ( z i | x i ) .Moreover, we deﬁne P , W Y | X =0 , P , W Y | X =1 ,Q , W Z | X =0 , Q , W Z | X =1 . As is common in the covert communication literature, it is assumed that (i) Q = Q , (ii) Q is absolutely continuouswith respect to Q (i.e., Q ≪ Q ), and (iii) P is absolutely continuous with respect to P (i.e., P ≪ P ). The ﬁrsttwo assumptions preclude the scenarios in which covertness is always guaranteed or would never be guaranteed, while thelast assumption precludes the possibility that the receivers enjoy an unfair advantage over the warden (as detailed in [14,Appendix G]). Let µ , min z : Q ( z ) > Q ( z ) , µ , min z : Q ( z ) > Q ( z ) , and e µ , min { µ , µ } Deﬁnition 1 (Identiﬁcation codes) . An identiﬁcation code C with message set M is a collection of codewords { U m } m ∈M anddecoding regions {D m } m ∈M , where U m ∈ P ( X n ) and D m ⊆ Y n . Remark 1.

In contrast to most communication problems wherein each message m is deterministically mapped to a ﬁxedsequence (the codeword) x ∈ X n , the identiﬁcation problem uses stochastic encoders such that message m is stochasticallymapped to a random sequence X according to the probability distribution U m ∈ P ( X n ) . Moreover, the decoding regions {D m } m ∈M in the identiﬁcation problem are not necessarily disjoint. The use of stochastic encoders and the fact that thedecoding regions are not disjoint are critical for communicating ω ( n ) bits of message over n channel uses. With a slight abuseof terminology, we refer to the distribution U m as the codeword for the message m . The tranmission status of the sender is denoted by T ∈ { , } . Communication takes place if T = 1 , while no communicationtakes place if T = 0 . When T = 1 , the sender selects a message m uniformly at random from M . The encoder then choosesa length- n sequence X ∈ X n according to the distribution U m . When T = 0 , the channel input is the length- n zero sequence . For the receiver R m ′ ( m ′ ∈ M ), upon receiving the channel output Y ∈ Y n through the BMC W ⊗ nY | X , it declares that themessage sent by the sender is m ′ if and only if Y ∈ D m ′ .The standard identiﬁcation problem usually focuses on two types of error— the error probability of the ﬁrst kind whichcorresponds to the probability that the true message is not identiﬁed by its designated receiver, and the error probability ofthe second kind which corresponds to the probability that the message is wrongly identiﬁed by some other receiver. For thecovert identiﬁcation problem, we introduce one more type of error— the error probability of the third kind which correspondsto the probability that the length- n zero sequence (when no communication takes place) is wrongly identiﬁed as a certainmessage by any receiver. We formalize these notions in the following deﬁnition. It is also possible to consider a more general setting with multiple non-zero input symbols (by following the lead of [15]); however, for simplicity andease of presentation, we focus on the binary-input setting in this work.

Deﬁnition 2 (Error probabilities) . When T = 1 and m ∈ M is sent, the error probability of the ﬁrst kind is deﬁned as P (1)err ( m ) , X x ∈X n U m ( x ) W ⊗ nY | X ( D cm | x ) . When T = 1 and m ′ ∈ M is sent, the error probability of the second kind corresponding to the receiver R m is deﬁned as P (2)err ( m, m ′ ) , X x ∈X n U m ′ ( x ) W ⊗ nY | X ( D m | x ) . When T = 0 and the length- n zero sequence is sent through the channel, the error probability of the third kind correspondingto the receiver R m is deﬁned as P (3)err ( m ) , P ⊗ n ( D m ) . Furthermore, let the corresponding maximum error probabilities (over all the messages or all pairs of distinct messages) be P (1)err , max m ∈M P (1)err ( m ) ,P (2)err , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ,P (3)err , max m ∈M P (3)err ( m ) Let b Q n C ( z ) be the output distribution on Z n for the warden induced by the identiﬁcation code, which takes the form b Q n C ( z ) , |M| X m ∈M X x ∈X n U m ( x ) W ⊗ nZ | X ( z | x ) , ∀ z ∈ Z n . (1)We adopt the widely-used KL-divergence metric D ( b Q n C k Q ⊗ n ) to measure covertness with respect to the warden. Deﬁnition 3 (Covertness) . The communication is δ -covert if the KL-divergence between the distribution b Q n C (when T = 1 )and Q ⊗ n (when T = 0 ) is bounded from above by δ , i.e., D ( b Q n C k Q ⊗ n ) ≤ δ. Let π | and π | respectively be the probabilities of false alarm (i.e., making an error when T = 0 ) and missed detection(i.e., making an error when T = 1 ) of the warden’s hypothesis test. By using the deﬁnition of the variational distance andPinsker’s inequality, we see that the optimal test satisﬁes π | + π | = 1 − V ( b Q n C , Q ⊗ n ) ≥ − q D ( b Q n C k Q ⊗ n ) . Thus, a small D ( b Q n C k Q ⊗ n ) implies a large sum-error π | + π | . This provides an operational meaning of the covertness metricin Deﬁnition 3. As discussed in prior works such as [15], [16], [30], the variational distance metric V ( b Q n C , Q ⊗ n ) is perhaps abetter metric under the speciﬁc assumption that T = 0 and T = 1 occur with equal probabilities, since it directly connects tothe average error probability of detection; however, the above assumption does not hold in general, thus both KL-divergenceand variational distance are deemed to be appropriate metrics in the literature. Deﬁnition 4.

A rate R is said to be δ -achievable if there exists a sequence of identiﬁcation codes with increasing blocklength n such that lim inf n →∞ log log |M|√ n ≥ R, D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ, lim n →∞ P (1)err = lim n →∞ P (2)err = lim n →∞ P (3)err = 0 . The δ -covert identiﬁcation capacity C δ is deﬁned as the supremum of all δ -achievable rates. Note that the coding rate R in the covert identiﬁcation problem is deﬁned as the iterated logarithm of the size of the messageset |M| normalized by √ n , which implies that the message size (if R > ) is of order exp(exp(Θ( √ n ))) . This intuitivelymakes sense because the channel identiﬁcation problem usually allows the message size to be as large as exp(exp(Θ( n ))) , butthe stringent covertness constraint reduces the exponent from Θ( n ) to Θ( √ n ) . In the following, we present the main resultthat characterizes the δ -covert identiﬁcation capacity of BMCs. Main result: The covert identiﬁcation capacity

Theorem 1.

For any BMCs W Y | X and W Z | X satisfying Q = Q , Q ≪ Q , and P ≪ P , the δ -covert identiﬁcation capacityis given by C δ = s δχ ( Q k Q ) D ( P k P ) . Some remarks are in order.1) Analogous to the canonical covert communication problem, we notice that the SRL also holds for the covert identiﬁcationproblem albeit with message size exp(exp(Θ( √ n ))) . Furthermore, the δ -covert identiﬁcation capacity is exactly the sameas the δ -covert capacity derived in [14], [15].2) In stark contrast to the standard covert communication problem [14] in which a shared key is needed to achieve the covertcapacity when the channels W Y | X and W Z | X satisfy D ( P k P ) ≤ D ( Q k Q ) , Theorem 1 above shows that regardlessof the values of D ( P k P ) and D ( Q k Q ) , the δ -covert identiﬁcation capacity is always achievable without any sharedkey . Intuitively, this is because the message size in our setting scales as exp( ω ( n )) , which automatically allows us tosatisfy the requirements on the shared key via proof techniques from channel resolvability [31] since it is well knownthat an exponential message size (of a suitably large exponent) sufﬁces to drive the approximation error (of the target andsynthesized distributions) to zero. This is reﬂected in Lemma 5 in our achievability proof.We prove the achievability part of Theorem 1 in Section IV, and the converse part in Section V.IV. A CHIEVABILITY

The achievability proof is partitioned into two stages. In the ﬁrst stage, we use a random coding argument with a PPMinput distribution and a modiﬁed information density decoder to show the existence of a “weak” covert identiﬁcation code.By “weak” we mean that this stage only guarantees that the average (rather than maximum ) error probability of the thirdkind vanishes. In the second stage, we apply a code reﬁnement process to the “weak” covert identiﬁcation code, such that thereﬁned code satisﬁes all the criteria for the three error probabilities and covertness in Deﬁnition 4.We ﬁrst provide a detailed introduction of PPM in Subsection IV-A. The ﬁrst stage of the achievability is described inSubsection IV-B and proved in Subsection IV-C, while the second stage is presented in Subsection IV-D.

A. Pulse Position Modulation (PPM)

Let l , $s (2 δ − n − / ) nχ ( Q k Q ) % be the weight parameter , and ( w, s ) be non-negative integers such that w , ⌊ n/l ⌋ and s , n − wl . We use x ∈ X w , y ∈Y w , z ∈ Z w to denote vectors of length w . We also let wt H ( x ) denote the number of ones, or the weight , of the vector x . Let P w X ( x ) , ( /w, if wt H ( x ) = 1 , , otherwise , be the distribution on X w such that P w X ( x ) is non-zero if and only if x has Hamming weight one. The corresponding outputdistributions P w Y and P w Z are respectively given by P w Y ( y ) , X x ∈X w P w X ( x ) W ⊗ wY | X ( y | x ) , and (2) P w Z ( z ) , X x ∈X w P w X ( x ) W ⊗ wY | X ( z | x ) . (3)For each i ∈ [1 : l ] , we deﬁne the length- w vector x ( i ) , x iw ( i − w +1 . Thus, every length- n vector x can be represented as x = [ x (1) , . . . , x ( l ) , x nwl +1 ] , where x nwl +1 is of length s . The PPM input distribution is thus deﬁned as P n,l X ( x ) , l Y i =1 P w X ( x ( i ) ) · (cid:8) wt H ( x nwl +1 ) = 0 (cid:9) . That is, we require each PPM-generated vector, also called a

PPM-sequence , x to contain exactly l ones; in particular, eachof the ﬁrst l intervals [1 : w ] , [ w + 1 : 2 w ] , . . . , [( l − w + 1 : lw ] contains a single one, and the last interval [ wl + 1 : n ] contains all zeros. B. Existence of a “weak” covert identiﬁcation code1) Encoder and Decoder:

Let η ∈ (0 , be arbitrary, the normalized weight parameter t , l/ √ n , r = (1 − η ) t D ( P k P ) ,and r ′ = (1 − ( η/ t D ( P k P ) . The size of the message set |M| = exp( e r √ n ) . For each message m ∈ M , we generate N , e r ′ √ n sequences { x m,i } Ni =1 independently according to P n,l X , and the codeword U m is the uniform distribution over theset { x m,i } Ni =1 , i.e., U m ( x ) , N N X i =1 { x = x m,i } , ∀ x ∈ X n . That is, we send each of the sequences { x m,i } Ni =1 with equal probability when m is the true message.Let γ , (1 − ǫ ) t D ( P k P ) , where < ǫ < η/ . To specify the decoding region D m for each message m ∈ M , we ﬁrstdeﬁne the set F x for each x ∈ X n as F x , ( y ∈ Y n : log W ⊗ nY | X ( y | x ) P ⊗ ( y ) > γ √ n ) . The decoding region for each m is D m , ∪ i ∈ [1: N ] F x m,i .

2) Error probabilities and distributions of interest:

Based on the encoding scheme described above, the error probabilitiesof the ﬁrst and second kinds can be rewritten as P (1)err ( m ) = X x ∈X n N N X i =1 { x = x m,i } W ⊗ nY | X ( D cm | x ) = 1 N N X i =1 W ⊗ nY | X ( D cm | x m,i ) , and (4) P (2)err ( m, m ′ ) = 1 N N X i =1 W ⊗ nY | X ( D m | x m ′ ,i ) . (5)Consider the PPM distribution P n,l X on X n , the corresponding output distributions P n,l Y on Y n and P n,l Z on Z n are respectivelygiven by P n,l Y ( y ) , X x ∈X n P n,l X ( x ) W ⊗ nY | X ( y | x )= X x ∈X n l Y i =1 P w X ( x ( i ) ) (cid:8) wt H ( x nwl +1 ) = 0 (cid:9) W ⊗ nY | X ( y | x )= l Y i =1 P w Y ( y ( i ) ) ! · P ⊗ s ( y nwl +1 ) , and (6) P n,l Y ( z ) , X x ∈X n P n,l X ( x ) W ⊗ nZ | X ( z | x ) = l Y i =1 P w Z ( z ( i ) ) ! · Q ⊗ s ( z nwl +1 ) , where (6) follows from (2). Given the sequences { x m,i } Ni =1 for each m ∈ M , we can also rewrite the output distribution b Q n C on Z n , which is ﬁrst deﬁned in (1), as b Q n C ( z ) = 1 |M| X m ∈M N N X i =1 W ⊗ nZ | X ( z | x m,i ) .

3) Performance guarantees:

Lemma 2 below shows that with high probability, the randomly generated identiﬁcation codeis a “weak” covert identiﬁcation code, in the sense that it only has a vanishing average (and not maximum) error probabilityof the third kind.

Lemma 2.

There exist vanishing sequences κ n , ε (1) n , ε (2) n , ε (3) n > (depending on the channels W Y | X , W Z | X and the covertnessparameter δ ) such that with probability at least − κ n over the code generation process, the randomly generated code satisﬁes max m ∈M P (1)err ( m ) ≤ ε (1) n , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ≤ ε (2) n , |M| X m ∈M P (3)err ( m ) ≤ ε (3) n , D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. C. Proof of Lemma 21) Analysis of P (1)err : Consider a ﬁxed message m ∈ M . By recalling Eqn. (4) and noting that D cm ⊆ F c x m,i , we have P (1)err ( m ) = 1 N N X i =1 W ⊗ nY | X ( D cm | x m,i ) ≤ N N X i =1 W ⊗ nY | X ( F c x m,i | x m,i ) . (7)Note that each x m,i is generated according to P n,l X , and E P n,l X (cid:16) W ⊗ nY | X ( F c X | X ) (cid:17) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) ( log W ⊗ nY | X ( y | x ) P ⊗ ( y ) ≤ γ √ n ) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x )  n X j =1 log W Y | X ( y j | x j ) P ( y j ) ≤ γ √ n  = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x )  X j : x j =1 log P ( y j ) P ( y j ) ≤ γ √ n  , (8)where (8) holds since log W Y | X ( y j | x j ) P ( y j ) = log P ( y j ) P ( y j ) = 0 for all j such that x j = 0 . Without loss of generality, we deﬁne x ∗ ∈ X n as the weight- l vector such that x ∗ ( j − w +1 = 1 for j ∈ [1 : l ] , thus (8) also equals X y W ⊗ nY | X ( y | x ∗ )  l X j =1 log P ( y ( j − w +1 ) P ( y ( j − w +1 ) ≤ γ √ n  = P P ⊗ l  l X j =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ≤ γ √ n  . Note that the random variables { log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) } j ∈ [1: l ] are independent and bounded, E ( P lj =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ) = l D ( P k P ) ,and γ √ n , (1 − ǫ ) l D ( P k P ) . By applying Hoeffding’s inequality (Lemma 1), we have P P ⊗ l  l X j =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ≤ γ √ n  ≤ e − c √ n , (9)for some constant c > .Let µ be a constant satisfying < µ < min { r ′ − r, γ − r ′ } , β n , e − c √ n , and α n , max { β n , e − µ √ n/ } . Considerthe N independent and identically distributed (i.i.d.) random variables { W ⊗ nY | X ( F c X m,i | X m,i ) } i ∈ [1: N ] which correspond to theright-hand side (RHS) of (7). Note that each random variable belongs to [0 , , and the expectation is at most β n accordingto (9). By applying Hoeffding’s inequality again and noting that α n − β n ≥ e − µ √ n/ / , we have P N N X i =1 W ⊗ nY | X ( F c X m,i | X m,i ) ≥ α n ! ≤ exp (cid:8) − N ( α n − β n ) (cid:9) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . Therefore, a union bound over all the messages m ∈ M yields P (cid:18) max m ∈M P (1)err ≥ α n (cid:19) = P ∃ m ∈ M : 1 N N X i =1 W ⊗ nY | X ( D cm | X m,i ) ≥ α n ! ≤ X m ∈M P N N X i =1 W ⊗ nY | X ( F X m,i | X m,i ) ≥ α n ! = exp (cid:26) − e ( r ′ − µ ) √ n + e r √ n (cid:27) , which vanishes since the choice of µ ensures r ′ − µ > r .

2) Analysis of P (2)err : Consider a ﬁxed message pair ( m, m ′ ) ∈ M . Recall from Eqn. (5) that P (2)err ( m, m ′ ) = 1 N N X i =1 W ⊗ nY | X ( D m | x m ′ ,i ) . (10)Suppose the set of PPM-sequences { x m,j } j ∈ [1: N ] (i.e., P n,l X ( x m,j ) = 0 ) for message m is ﬁxed, thus the decoding region D m is also ﬁxed. Since D m = ∪ j ∈ [1: N ] F x m,j , we have E P n,l X (cid:16) W ⊗ nY | X ( D m | X ) (cid:17) ≤ N X j =1 E P n,l X (cid:16) W ⊗ nY | X ( F x m,j | X ) (cid:17) . (11) Lemma 3.

Let ξ , P y ∈Y P ( y ) P ( y ) . For any PPM-sequence e x ∈ X n , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) ≤ exp (cid:8) − γ √ n + l ( ξ − /w (cid:9) . (12) Proof of Lemma 3.

For any PPM-sequence e x ∈ X n , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) ( log W ⊗ nY | X ( y | e x ) P ⊗ n ( y ) > γ √ n ) = X y P n,l Y ( y ) P ⊗ n ( y ) P ⊗ n ( y ) ( log W ⊗ nY | X ( y | e x ) P ⊗ n ( y ) > γ √ n ) ≤ e − γ √ n X y P n,l Y ( y ) P ⊗ n ( y ) W ⊗ nY | X ( y | e x ) (13) = e − γ √ n X y (cid:16)Q li =1 P w Y ( y ( i ) ) (cid:17) · P ⊗ s ( y nwl +1 ) P ⊗ n ( y ) W ⊗ nY | X ( y | e x ) (14) = e − γ √ n  l Y i =1 X y ( i ) P w Y ( y ( i ) ) P ⊗ w ( y ( i ) ) W ⊗ wY | X ( y ( i ) | e x ( i ) )   X y nwl +1 P ⊗ s ( y nwl +1 ) P ⊗ s ( y nwl +1 ) P ⊗ s ( y nwl +1 )  , (15)where (13) holds since we only consider y that satisﬁes log (cid:16) W ⊗ nY | X ( y | e x ) /P ⊗ n ( y ) (cid:17) > γ √ n , and (14) follows from (6). Withoutloss of generality, we consider the ﬁrst interval [1 : w ] such that y (1) = [ y , . . . , y w ] and e x (1) = [ e x , . . . , e x w ] , and by symmetrywe further assume e x = 1 and e x j = 0 for j ∈ [2 : w ] . Thus, X y (1) P w Y ( y (1) ) P ⊗ w ( y (1) ) W ⊗ wY | X ( y (1) | e x (1) ) = X y (1) P w Y ( y (1) ) P ⊗ w ( y (1) )  P ( y ) w Y j =2 P ( y j )  = X y ( P w Y ) ( y ) P ( y ) P ( y ) (16) = X y w P ( y ) + w − w P ( y ) P ( y ) P ( y )= 1 + 1 w ( ξ − , (17)where ( P w Y ) in (16) stands for the marginal distribution of P w Y which takes the form w P + w − w P . Combining (15) and (17)and applying the inequality log(1 + x ) ≤ x , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) ≤ e − γ √ n (cid:18) w ( ξ − (cid:19) l ≤ e − γ √ n e l ( ξ − /w , which completes the proof.Combining (11) and Lemma 3, we obtain that the expectation of the random variable W ⊗ nY | X ( D m | X ) is bounded from aboveas E P n,l X (cid:16) W ⊗ nY | X ( D m | X ) (cid:17) ≤ exp (cid:26) − ( γ − r ′ ) √ n + l ( ξ − w (cid:27) , β ′ n , (18)which vanishes since r ′ < γ . Let α ′ n , max { β ′ n , e − µ √ n/ } , and note that α ′ n − β ′ n ≥ e − µ √ n/ / . Consider the N i.i.d.random variables { W ⊗ nY | X ( D m | X m ′ ,i ) } i ∈ [1: N ] which are present in the RHS of (10). Note that each random variable belongsto [0 , , and the expectation is at most β ′ n . By applying Hoeffding’s inequality, we have that for ﬁxed D m , P { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! ≤ exp (cid:8) − N ( α ′ n − β ′ n ) (cid:9) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . (19) Note that (19) is true for any ﬁxed D m (or equivalently, any ﬁxed { x m,j } j ∈ [1: N ] ) that corresponds to message m . Next, wealso take the randomness of { X m,j } j ∈ [1: N ] into consideration. Let D m be the chance variable corresponding to D m , and wehave P { X m,i } , { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! = X D m P { X m,i } ( D m = D m ) P { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! (20) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . (21)Finally, a union bound over all the message pairs ( m, m ′ ) ∈ M yields P (cid:18) max ( m,m ′ ) ∈M : m = m ′ P (2)err ≥ α ′ n (cid:19) ≤ X ( m,m ′ ) ∈M : m = m ′ P N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! = exp (cid:26) − e ( r ′ − µ ) √ n + 2 e r √ n (cid:27) , which vanishes since the choice of µ ensures r ′ − µ > r . Remark 2.

The analysis of P (2)err relies on the fact that every PPM-sequence has the same weight. Speciﬁcally, the crux of ourproof is to ﬁrst bound the error term N P Ni =1 W ⊗ nY | X ( D m | X m ′ ,i ) with respect to a ﬁxed realization D m of the chance variable D m (or equivalently, a ﬁxed realization { x m,j } Nj =1 for message m ). Next, we take an expectation over D m , as reﬂectedin (20) . Using PPM ensures that for every realization { x m,j } Nj =1 , each element x m,j is a weight- l PPM-sequence that satisﬁesinequality (12) of Lemma 3. Following Lemma 3 and the analysis starting from (18) , it is shown that for every realization D m ,one can derive the same upper bound on the error probability as presented in (19) ; thus it becomes straightforward to takean expectation over D m to obtain (21) from (20) . In contrast, if a coding scheme in which the weight of each x m,j is randomwere used (which would be the case if each component of x m,j was generated in an i.i.d. manner), it would require moreeffort to analyze the chance variable D m since the upper bound on the error probability in (19) depends on each realization of D m . In fact, the proof technique to upper bound P (2)err is also applicable to any constant composition code, i.e., not restrictedto PPM codes. The reason why we adopt PPM is that it makes the proof of covertness easier, since, as shown in Lemma 4 tofollow, the PPM-induced output distribution P n,l Z possess favorable covertness properties.3) Analysis of P (3)err : For a ﬁxed message m ∈ M , the error probability of the third kind is bounded from above as P (3)err ( m ) = P ⊗ n ( D m ) ≤ N X i =1 P ⊗ n ( F x m,i ) , and the expected value of this error probability (averaged over the generation of { X m,i } i ∈ [1: N ] ) is bounded from above as E { X m,i } (cid:16) P (3)err ( m ) (cid:17) ≤ N X i =1 E { X m,i } (cid:0) P ⊗ n ( F X m,i ) (cid:1) = e r ′ √ n X x P n,l X ( x ) X y P ⊗ n ( y ) ( log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > γ √ n ) ≤ e r ′ √ n · e − γ √ n X x X y P n,l X ( x ) W ⊗ nY | X ( y | x ) ≤ e − ( γ − r ′ ) √ n . Thus, the expected value of the average error probability of the third kind satisﬁes E |M| X m ∈M P (3)err ( m ) ! ≤ e − ( γ − r ′ ) √ n . By applying Markov’s inequality, we have P |M| X m ∈M P (3)err ( m ) ≥ e − ( γ − r ′ − µ ) √ n ! ≤ e − µ √ n . That is, with probability at least − e − µ √ n over the random code selection, the average error probability of the third kind |M| − P m ∈M P (3)err ( m ) ≤ e − ( γ − r ′ − µ ) √ n , which tends to zero as n tends to inﬁnity since the choice of µ ensures that γ − r ′ > µ .

4) Analysis of covertness:

First note that the KL-divergence D (cid:16) b Q n C k Q ⊗ n (cid:17) = D (cid:16) P n,l Z k Q ⊗ n (cid:17) + D (cid:16) b Q n C k P n,l Z (cid:17) + X z (cid:16) b Q n C ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) . (22)In the following, we upper bound the three terms on the RHS of (22) in Lemmas 4 and 5. Lemma 4.

For sufﬁciently large n , the KL-divergence D ( P n,l Z k Q ⊗ n ) ≤ δ − n − / . Proof of Lemma 4.

The proof is essentially due to [27, Lemma 1] and [16, Lemma 8], which analyze the output statistics ofthe PPM distribution and state that D (cid:16) P n,l Z k Q ⊗ n (cid:17) ≤ l n χ ( Q k Q ) + O (cid:18) √ n (cid:19) . Substituting l = ⌊ p (2 δ − n − / ) n/χ ( Q k Q ) ⌋ , we complete the proof. Lemma 5.

There exist constant c , c > such that with probability at least − exp( − c √ n ) over the random code design,the output distribution b Q n C induced by C ensures D (cid:16) b Q n C k P n,l Z (cid:17) ≤ exp {− c √ n } , and X z (cid:16) b Q n ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) ≤ n (cid:18) log 1 µ (cid:19) exp {− c √ n/ } . Proof of Lemma 5.

Recall that b Q n C is the output distribution induced by the set ∪ m ∈M { X m,i } i ∈ [1: N ] with each sequence beinggenerated i.i.d. according to P n,l X . We ﬁrst borrow a result from [32, Eq. (10)] and [16, Eq. (81)] which states that E (cid:16) D (cid:16) b Q n C k P n,l Z (cid:17)(cid:17) ≤ E P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) |M| N P n,l Z ( Z ) !! . (23)Let τ , t D ( Q k Q ) and B τ , n ( x , z ) : log( W ⊗ nZ | X ( z | x ) /Q ⊗ n ( z )) < τ √ n o . Then, by partitioning ( x , z ) into ( x , z ) ∈ B τ and ( x , z ) / ∈ B τ , the term in (23) can be expressed as X ( x , z ) ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) log W ⊗ nZ | X ( z | x ) |M| N Q ⊗ n ( z ) Q ⊗ n ( z ) P n,l Z ( z ) ! + X ( x , z ) / ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) log W ⊗ nZ | X ( z | x ) |M| N P n,l Z ( z ) ! . (24)The ﬁrst term of (24) is bounded from above by e τ √ n |M| N X ( x , z ) ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) Q ⊗ n ( z ) P n,l Z ( z ) ≤ e τ √ n |M| N , (25)and the second term of (24) is bounded from above by log  |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z )  × P P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) Q ⊗ n ( Z ) ≥ τ √ n ! . (26)Before we state the next lemma, we recall that µ = min z : Q ( z ) > Q ( z ) , µ = min z : Q ( z ) > Q ( z ) , and e µ = min { µ , µ } . Lemma 6.

We have log  |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z )  ≤ n log (1 + e µ ) . We defer the proof of Lemma 6 to Appendix A. It then remains to consider the other term in (26). P P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) Q ⊗ n ( Z ) ≥ τ √ n ! = X x P n,l X ( x ) X z W ⊗ nZ | X ( z | x ) ( log W ⊗ nZ | X ( z | x ) Q ⊗ n ( z ) ≥ τ √ n ) = X z W ⊗ nZ | X ( z | x ∗ )  n X j =1 log W Z | X ( z i | x ∗ j ) Q ( z j ) ≥ τ √ n  (27) = P Q ⊗ l  l X j =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ≥ τ √ n  , (28)where (27) is due to symmetry and recall that x ∗ is the weight- l vector such that x ∗ ( j − w +1 = 1 for j ∈ [1 : l ] . By noting that E ( P lj =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ) = l D ( Q k Q ) and τ √ n , l D ( Q k Q ) , applying Hoeffding’s inequality yields P Q ⊗ l  l X j =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ≥ τ √ n  ≤ e − c √ n (29)for some constant c > . Combining (23), (25), (26), (29), the fact that |M| = exp { e r √ n } , and applying the Markov’sinequality, we obtain that there exist constants c , c > such that with probability at least − exp( − c √ n ) over the codedesign, D (cid:16) b Q n C k P n,l Z (cid:17) ≤ exp {− c √ n } . Finally, by Pinsker’s inequality, we know that V ( b Q n , P n,l Z ) ≤ q D ( b Q n k P n,l Z ) ≤ exp {− c √ n/ } , thus X z (cid:16) b Q n ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) ≤ n (cid:18) log 1 µ (cid:19) · V ( b Q n , P n,l Z ) ≤ n (cid:18) log 1 µ (cid:19) exp {− c √ n/ } . This completes the proof of Lemma 5.Combining (22) and Lemmas 4 and 5, we conclude that with probability at least − exp( − c √ n ) over the random code C ,we have D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ for sufﬁciently large n . D. Code reﬁnements

In the following, we reﬁne a given “weak” covert identiﬁcation code such that the reﬁned code satisﬁes the error criteriaand covertness property in Deﬁnition 4 and simultaneously retains the rate of the original code.

Lemma 7.

Let δ > and ε (1) n , ε (2) n , ε (3) n > be vanishing sequences. Suppose there exists a sequence of codes C (of size |M| )satisfying max m ∈M P (1)err ( m ) ≤ ε (1) n , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ≤ ε (2) n , |M| X m ∈M P (3)err ( m ) ≤ ε (3) n , D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. Then, there exist vanishing sequences e ε (1) n , e ε (2) n , e ε (3) n > (depending on ε (1) n , ε (2) n , ε (3) n ) and another sequence of codes e C ofsize | f M| ≥ (1 − e ε (3) n ) |M| such that max m ∈ f M P (1)err ( m ) ≤ e ε (1) n , max ( m,m ′ ) ∈ f M : m = m ′ P (2)err ( m, m ′ ) ≤ e ε (2) n , max m ∈ f M P (3)err ( m ) ≤ e ε (3) n , D (cid:16) b Q n e C k Q ⊗ n (cid:17) ≤ δ. Proof of Lemma 7.

We ﬁrst partition the messages in C into two disjoint sets. Deﬁnition 5.

Consider a code C that satisifes |M| P m ∈M P (3)err ( m ) ≤ ε (3) n . We say a message m ∈ M is a good message if P (3)err ( m ) ≤ ( ε (3) n ) / , and a bad message otherwise. Let f M ⊂ M be the set that contains all the good messages, and f M c be the set that contains all the bad messages. Without lossof generality, we assume f M = [1 : | f M| ] and f M c = [ | f M| + 1 : |M| ] . Since the code C satisﬁes P m ∈M P (3)err ( m ) ≤ ε (3) n |M| ,the number of bad messages is at most ( ε (3) n ) / |M| , i.e., | f M c | ≤ ( ε (3) n ) / |M| and | f M| ≥ (1 − ( ε (3) n ) / ) |M| . Recall that for each message m ∈ M , the corresponding set of sequences is { x m,i } i ∈ [1: N ] . We then denote the set of sequencesthat correspond to all the bad messages by B , ∪ m ∈ f M c { x m,i } i ∈ [1: N ] , and note that |B| ≤ N · ( ε (3) n ) / |M| . In the following, we construct a new code e C that contains | f M| messages.1) We partition the set B into | f M| equal-sized disjoint subsets B (1) , B (2) , . . . , B ( | f M| ) such that the cardinality of each subset(for m ∈ f M ) satisﬁes |B ( m ) | = |B|| f M| ≤ N · ( ε (3) n ) / |M| (1 − ( ε (3) n ) / ) |M| , ν n N, (30)where ν n also tends to 0 as n tends to inﬁnity.2) For each m ∈ f M , the corresponding set of sequences in the original code C is { x m,i } i ∈ [1: N ] . In the new code e C , weenlarge this set by appending B ( m ) to { x m,i } i ∈ [1: N ] . Thus, the codeword U m is the uniform distribution over a larger setof sequences { x m,i } i ∈ [1: N ] ∪ B ( m ) .3) For each m ∈ f M , the decoding region of the new code e C remains as D m = ∪ i ∈ [1: N ] F x m,i . That is, the decoding regionsof the new code e C and the original code C are exactly the same.We now analyze the error probabilities of the new code e C . For each m ∈ f M , the error probability of the ﬁrst kind is boundedfrom above as P (1)err ( m ) = P Ni =1 W ⊗ nY | X ( D cm | x m,i ) + P x ∈B ( m ) W ⊗ nY | X ( D cm | x ) N + |B ( m ) |≤ NN + |B ( m ) | N N X i =1 W ⊗ nY | X ( D cm | x m,i ) ! + |B ( m ) | N + |B ( m ) | (31) ≤ ε (1) n + ν n , (32)where (31) holds since W ⊗ nY | X ( D cm | x ) ≤ , and (32) is due to (30) and the assumption that the original code satisﬁes N P Ni =1 W ⊗ nY | X ( D cm | x m,i ) ≤ ε (1) n . Similarly, for each message pair ( m, m ′ ) ∈ f M , the error probability of the second kind P (2)err ( m, m ′ ) is bounded from above as P (2)err ( m, m ′ ) = P Ni =1 W ⊗ nY | X ( D m | x m ′ ,i ) + P x ∈B ( m ′ ) W ⊗ nY | X ( D cm | x ) N + |B ( m ′ ) |≤ NN + |B ( m ′ ) | N N X i =1 W ⊗ nY | X ( D cm | x m ′ ,i ) ! + |B ( m ′ ) | N + |B ( m ′ ) |≤ ε (2) n + ν n . Since all the messages in f M are good messages, by Deﬁnition 5 we have that for each message m ∈ f M , P (3)err ( m ) ≤ ( ε (3) n ) / . Finally, note that when constructing e C , we merely rearrange the sequences of C (rather than expurgate or add any sequences);thus, the output distribution induced by e C is exactly the same as that induced by C , i.e., D (cid:16) b Q n e C k Q ⊗ n (cid:17) = D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. Thus, the covertness constraint is satisﬁed. Finally, we note that lim inf n →∞ log log | f M|√ n = lim inf n →∞ log log |M|√ n = (1 − η ) C δ , and the proof is completed by taking η → + . V. C

ONVERSE

In this section, we show that any sequence of identiﬁcation codes with size |M| that simultaneously guarantees that D ( b Q n C k Q ⊗ n ) ≤ δ and P (1)err = λ (1) n , P (2)err = λ (2) n , P (3)err = λ (3) n (where lim n →∞ λ (1) n = lim n →∞ λ (2) n = lim n →∞ λ (3) n = 0 )must satisfy lim sup n →∞ log log |M|√ n ≤ C δ . Lemma 8.

Consider any identiﬁcation code C with message set M , codewords { U m } m ∈M , and decoding regions {D m } m ∈M such that D ( b Q n C k Q ⊗ n ) ≤ δ . Let f H ( m ) , n P x U m ( x )wt H ( x ) be the fractional Hamming weight for each message m ∈ M .Then, there exists a constant c > such that the average fractional Hamming weight of C satisﬁes |M| X m ∈M f H ( m ) ≤ s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) . (33) Proof of Lemma 8.

We denote the i -th marginal distribution of each codeword U m as ( U m ) i for i ∈ [1 : n ] , and the i -marginaldistribution of b Q n C as ( b Q n C ) i , which takes the form ( b Q n C ) i ( z ) = 1 |M| X m ∈M X x ∈X ( U m ) i ( x ) W Z | X ( z | x ) , ∀ z ∈ Z . Let ¯ Q C ( z ) , n P ni =1 ( b Q n C ) i ( z ) . By taking the covertness constraint into account and following the analysis in [15, Eqn. (13)],we have δ ≥ D (cid:16) b Q n C k Q ⊗ n (cid:17) ≥ n D (cid:0) ¯ Q C k Q (cid:1) , (34)and thus lim n →∞ D (cid:0) ¯ Q C k Q (cid:1) = 0 . By applying Pinsker’s inequality V (cid:0) ¯ Q C , Q (cid:1) ≤ q D (cid:0) ¯ Q C k Q (cid:1) / , we also have lim n →∞ V (cid:0) ¯ Q C , Q (cid:1) = 0 . Let ψ = ψ n , n P ni =1 1 |M| P m ∈M ( U m ) i (1) be the fraction of ’s in the codebook, and one can express ¯ Q C ( z ) as ¯ Q C ( z ) = 1 n n X i =1 |M| X m ∈M X x ∈X ( U m ) i ( x ) W Z | X ( z | x )= n n X i =1 |M| X m ∈M ( U m ) i (1) ! Q ( z ) + n n X i =1 |M| X m ∈M ( U m ) i (0) ! Q ( z )= ψQ ( z ) + (1 − ψ ) Q ( z ) . Note that the requirement on variational distance lim n →∞ V (cid:0) ¯ Q C , Q (cid:1) = 0 implies that lim n →∞ ψ = 0 . Furthermore, we knowfrom [14, Eqn. (11)] that D (cid:0) ¯ Q C k Q (cid:1) ≥ ψ χ ( Q k Q ) − O ( ψ ) . (35)Combining (34) and (35), one can bound ψ from above as ψ ≤ s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) , (36) for some constant c > . At the same time, one also can interpret ψ as the average fractional Hamming weight of the code,since ψ = 1 n n X i =1 |M| X m ∈M ( U m ) i (1)= 1 |M| X m ∈M n n X i =1 X x i ( U m ) i ( x i ) { x i = 1 } = 1 |M| X m ∈M n n X i =1 X x i X x ( − i ) U m ( x i , x ( − i ) ) ! { x i = 1 } = 1 |M| X m ∈M n n X i =1 X x U m ( x ) { x i = 1 } = 1 |M| X m ∈M n X x U m ( x )wt H ( x )= 1 |M| X m ∈M f H ( m ) , (37)where x ( − i ) = ( x , . . . , x i − , x i +1 , . . . , x n ) ∈ X n − . This completes the proof of Lemma 8.For notational convenience, let k , s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) . Lemma 9 (Expurgation Lemma) . Suppose there exists a sequence of identiﬁcation codes C with message set M , codewords { U m } m ∈M , and decoding regions {D m } m ∈M such that D ( b Q n C k Q ⊗ n ) ≤ δ , P (1)err = λ (1) n , P (2)err = λ (2) n , and P (3)err = λ (3) n , where lim n →∞ λ (1) n = lim n →∞ λ (2) n = lim n →∞ λ (3) n = 0 .Then, there exist a sequence κ n > (which depends on λ (1) n , λ (2) n ) which satisﬁes lim n →∞ κ n = 0 and a sequence ofidentiﬁcation codes C ′ with message set M ′ , codewords { U ′ m } m ∈M ′ , and decoding regions {D ′ m } m ∈M ′ such that1) |M ′ | ≥ |M| / ( n + 1) ;2) For every m ∈ M ′ , U ′ m ( x ) = 0 for all x such that wt H ( x ) > (1 + κ n ) kn ;3) P (1)err ≤ ( λ (1) n ) / , P (2)err ≤ ( λ (2) n ) / , and P (3)err ≤ λ (3) n .Proof of Lemma 9. Since the identiﬁcation code C satisﬁes D ( b Q n C k Q ⊗ n ) ≤ δ , Lemma 8 above ensures that its average fractionalHamming weight |M| P m ∈M f H ( m ) ≤ k . We deﬁne G as the subset of messages with small fractional Hamming weight, i.e., G , (cid:26) m ∈ M : f H ( m ) ≤ (cid:18) n (cid:19) k (cid:27) . (38)From (36) and (37), we have k ≥ ψ = 1 |M| X m ∈G f H ( m ) + 1 |M| X m ∈M\G f H ( m ) ≥ |M \ G||M| (cid:18) n (cid:19) k, which further implies that |G| ≥ |M| / ( n + 1) , i.e., the number of messages with small fractional Hamming weight is notsmall. Let λ n , max { λ (1) n , λ (2) n } and ǫ n , √ λ n −√ λ n . We partition X n into two disjoint sets—the low-weight set X n l , { x ∈X n : wt H ( x ) ≤ (1 + ǫ n ) (cid:0) n (cid:1) kn } and the high-weight set X n h , X n \ X n l . In the following, we describe the procedure ofconstructing the new code C ′ .1) First, the message set of the new code is M ′ = G . Thus, f H ( m ) ≤ (cid:0) n (cid:1) k for all m ∈ M ′ .2) For each m ∈ M ′ , we deﬁne g m , P x ∈X n l U m ( x ) , and we set the codeword U ′ m of the new code C ′ to be U ′ m ( x ) = ( U m ( x ) /g m , if x ∈ X n l , , otherwise . One can check that P x U ′ m ( x ) = 1 .3) The decoding regions of the new code C ′ are the same as those of C , i.e., D ′ m = D m for all m ∈ M ′ . From (38) we have that for each m ∈ M ′ , (cid:18) n (cid:19) k ≥ f H ( m ) = 1 n X x U m ( x ) wt H ( x ) ≥ n X x ∈X n h U m ( x ) wt H ( x ) ≥ n X x ∈X n h U m ( x ) · (1 + ǫ n ) (cid:18) n (cid:19) kn, which yields a lower bound on g m , i.e., g m = X x ∈X n l U m ( x ) = 1 − X x ∈X n h U m ( x ) ≥ ǫ n ǫ n . (39)We now analyze the error probabilities of the new code C ′ which only consists of low-weight sequences. For each m ∈ M ′ ,the error probability of the ﬁrst kind P (1)err ( m ) can be bounded from above as P (1)err ( m ) = X x ∈X n l U ′ m ( x ) W ⊗ nY | X ( D m | x )= X x ∈X n l U m ( x ) g m W ⊗ nY | X ( D m | x ) ≤ g m X x U m ( x ) W ⊗ nY | X ( D m | x ) ≤ (cid:18) ǫ n ǫ n (cid:19) λ (1) n (40) ≤ (cid:16) λ (1) n (cid:17) / , (41)where (40) follows from (39) the the fact that the original code C satisﬁes P x U m ( x ) W ⊗ nY | X ( D m | x ) ≤ λ (1) n , and (41) is dueto the choice of ǫ n . Furthermore, for each message pair ( m, m ′ ) ∈ M ′ × M ′ such that m = m ′ , the error probability of thesecond kind P (2)err ( m, m ′ ) can be similarly bounded from above as P (2)err ( m, m ′ ) = X x ∈X n l U ′ m ′ ( x ) W ⊗ nY | X ( D cm | x )= X x ∈X n l U m ′ ( x ) g m ′ W ⊗ nY | X ( D cm | x )= 1 g m ′ X x U m ′ ( x ) W ⊗ nY | X ( D cm | x ) ≤ (cid:18) ǫ n ǫ n (cid:19) λ (2) n ≤ (cid:16) λ (2) n (cid:17) / . Finally, we note that the error probability of the third kind P (3)err ( m ) = P ⊗ n ( D ′ m ) is still bounded from above by λ (3) n ,since the decoding regions are unchanged, i.e., D ′ m = D m for m ∈ M ′ . We complete the proof of Lemma 9 by setting κ n = (1 + ǫ n ) (cid:0) n (cid:1) − , which vanishes as n tends to inﬁnity.Proving the converse of identiﬁcation problems usually relies on the achievability results for the channel resolvabilityproblem. In the following, we ﬁrst introduce the deﬁnition of the K -type distributions , and then state a modiﬁed version ofthe channel resolvability result in Lemma 10. Lemma 10 is modiﬁed from the so-called soft-covering lemma presented byCuff [33, Corollary VII.2]. Deﬁnition 6.

For any positive integer K , a probability distribution P ∈ P ( X ) is said to be a K -type distribution if P ( x ) ∈ (cid:26) , K , K , . . . , (cid:27) , ∀ x ∈ X . Lemma 10.

Let P X ∈ P ( X n ) and P Y ( y ) = P x P X ( x ) W ⊗ nY | X ( y | x ) . We randomly sample K i.i.d. sequences x , . . . , x K according to P X . Let e P X ( x ) = 1 K K X i =1 { x = x i } , ∀ x ∈ X n be a K -type distribution and e P Y ( y ) = P x e P X ( x ) W ⊗ nY | X ( y | x ) be the corresponding output distribution. Then, for any ζ > and any P ′ Y ∈ P ( Y n ) , E (cid:16) V (cid:16) P Y , e P Y (cid:17)(cid:17) ≤ P P X W ⊗ nY | X log W ⊗ nY | X ( Y | X ) P ′ Y ( Y ) > ζ ! + 12 r e ζ K , where the expectation on the left-hand-side of the above inequality is over the random generation of x , . . . , x K .Proof. The proof is presented in Appendix B, and is adapted from [33, Section VII-C] with proper modiﬁcations.Note that Lemma 10 above holds for any P ′ Y ∈ P ( Y n ) , which differs from an analogous (but more restrictive) result in [33,Corollary VII.2] wherein P ′ Y is set to be P Y . This ﬂexibility of choosing P ′ Y arbitrarily is important for proving the conversebecause we need to set it to P ⊗ n later for the analysis of the covert identiﬁcation problem. We now consider the identiﬁcationcode C ′ constructed in Lemma 9. Lemma 11.

Let K , ⌈ (1 + n − / ) (1 + κ n ) kn D ( P k P ) ⌉ . For every message m ∈ M ′ with codeword U ′ m , there exists a K -type distribution e U m such that V (cid:16) U ′ m W ⊗ nY | X , e U m W ⊗ nY | X (cid:17) ≤ exp (cid:0) − c n / (cid:1) for some constant c > , where U ′ m W ⊗ nY | X and e U m W ⊗ nY | X respectively denote the distributions on Y n induced by U ′ m and e U m through the channel W ⊗ nY | X .Proof of Lemma 11. Consider a speciﬁc m ∈ M ′ with codeword U ′ m . Substituting P X with U ′ m , P ′ Y with P ⊗ n , and setting ζ , (1 + n − / )(1 + κ n ) kn D ( P k P ) in Lemma 10, we have P U ′ m W ⊗ nZ | X log W ⊗ nY | X ( Y | X ) P ⊗ n ( Y ) > ζ ! = X x X y U ′ m ( x ) W ⊗ nY | X ( y | x ) log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > ζ ! = (1+ κ n ) kn X q =0 X x : wt H ( x )= q U ′ m ( x ) X y W ⊗ nY | X ( y | x ) × log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > ζ ! (42) = (1+ κ n ) kn X q =0 X x : wt H ( x )= q U ′ m ( x ) P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) > ζ ! , (43)where in (42) we partition x into different type classes characterized by their Hamming weights, and (43) is obtained byassuming x i = 1 for i ∈ [1 : q ] and x i = 0 for i ∈ [ q + 1 : n ] without loss of generality. Also note that ζ − q D ( P k P ) ≥ ζ − (1 + κ n ) kn D ( P k P ) = n − / (1 + κ n ) kn D ( P k P ) , Υ . (44)Thus, we have P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) > ζ ! ≤ P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) − q D ( P k P ) > Υ ! (45) ≤ exp (cid:0) − c n / (cid:1) , (46)where (45) is obtained by subtracting q D ( P k P ) from both sides and by the inequality in (44), and (46) holds for someconstant c > and is obtained by applying Hoeffding’s inequality. Hence, the term in (43) is bounded from above by exp (cid:0) − c n / (cid:1) . Furthermore, one can also show that p e ζ /K ≤ exp( − n / ζ ) .Therefore, by Lemma 10, for every message m ∈ M ′ , there exists a K -type distribution e U m such that V (cid:16) U ′ m W ⊗ nY | X , e U m W ⊗ nY | X (cid:17) ≤ exp (cid:0) − c n / (cid:1) + exp (cid:0) − n / ζ (cid:1) ≤ exp (cid:0) − c n / (cid:1) , for some constant c > and all n large enough. In the following, we apply standard channel identiﬁcation converse techniques to the code C ′ . For any m, m ′ ∈ M ′ suchthat m = m ′ , we have V (cid:16) U ′ m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X (cid:17) ≥ U ′ m W ⊗ nY | X ( D m ) − U ′ m ′ W ⊗ nY | X ( D m ) ≥ − ( λ (1) n ) / − ( λ (2) n ) / , (47)where the last inequality is due to Lemma 9 which states that the error probabilities of C ′ satisﬁes P (1)err ≤ ( λ (1) n ) / and P (2)err ≤ ( λ (2) n ) / . Meanwhile, from Lemma 11 we know that there exists a set of K -type distributions { e U m } m ∈M ′ such that V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) ≤ exp (cid:16) − c n / (cid:17) , ∀ m ∈ M ′ . (48)Combining (47) and (48), we have the following claim. Lemma 12.

For sufﬁciently large n , the distributions in { e U m } m ∈M ′ are distinct, i.e., there does not exist ( m, m ′ ) with m = m ′ such that e U m = e U m ′ .Proof of Lemma 12. Suppose e U m = e U m ′ for some m = m ′ . By the triangle inequality, we have V ( U ′ m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X ) ≤ V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) + V ( e U m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X )= V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) + V ( e U m ′ W ⊗ nY | X , U ′ m ′ W ⊗ nY | X ) ≤ (cid:0) − c n / (cid:1) , which contradicts (47) for sufﬁciently large n .It is worth noting that the number of distinct K -type distributions on X n is at most |X | nK . Thus, combining Lemma 11and Lemma 12, we have |M ′ | ≤ |X | nK , and by taking iterated logarithms on both sides, we have log log |M ′ | ≤ log K + log n + log log |X | . Therefore, by recalling that |M ′ | ≥ |M| / ( n +1) , K = ⌈ (1+ n − / ) (1+ κ n ) kn D ( P k P ) ⌉ , and k = q δχ ( Q k Q ) (cid:16) √ n + c n (cid:17) ,we eventually obtain that lim sup n →∞ log log |M|√ n ≤ lim sup n →∞ log log |M ′ |√ n ≤ lim sup n →∞ (cid:18) log K √ n + log n √ n + log log |X |√ n (cid:19) = s δχ ( Q k Q ) D ( P k P ) = C δ . This completes the proof of the converse part. VI. C

ONCLUDING R EMARKS

This work investigates the covert identiﬁcation problem, showing that an ID message of size exp(exp(Θ( √ n ))) can bereliably and covertly transmitted over n channel uses. We also characterize the covert identiﬁcation capacity and show that itequals the covert capacity in the standard covert communication problem. The covert identiﬁcation capacity can be achievedwithout any shared key.Finally, we put forth several directions that we believe are fertile avenues for future research. • Strictly speaking, the converse result established in Section V is commonly known as a weak converse because allthree error probabilities are allowed to vanish as n grows. One would then expect that a strong converse for the covertidentiﬁcation problem can be shown. This can perhaps be achieved following the lead of [3] and [34, Chapter 6] for thestandard channel identiﬁcation problem. The key limitation of our converse technique that prevents us from deriving thestrong converse is the use of Lemma 9 (Expurgation Lemma), wherein we expurgate many high-weight sequences suchthat the error probabilities of the expurgated code increase signiﬁcantly. Thus, a promising way to circumvent this issuemight be developing a more general result for channel resolvability with stringent input constraints (i.e., extending theapplicability of Lemma 11), instead of applying the Expurgation Lemma. • Having established the (ﬁrst-order) fundamental limits, it is then natural to derive the error exponent of the covertidentiﬁcation problem. One may follow the lead of the error exponent analysis for the standard identiﬁcation problemby Ahlswede and Dueck [2]. However, due to the stringent input constraints mandated by the covertness constraints, thisstrategy requires special care and new analytical techniques to obtain closed-form expressions. • In addition to the KL-divergence metric studied in this work, it is also worth considering alternative covertness metricssuch as the variational distance and the probability of missed detection [16]. A PPENDIX AP ROOF OF L EMMA Q ≪ Q , and without loss of generality, we assume there does not exist a symbol z such that Q ( z ) = Q ( z ) = 0 . Let Z ′ , { z ∈ Z : Q ( z ) = 0 , Q ( z ) > } be the subset of symbols that are impossible to be inducedby the input symbol X = 1 . Let I ( z ) , { j ∈ [1 : n ] : z j ∈ Z ′ } be the set of locations such that the corresponding elementsbelong to Z ′ . Note that if z satisﬁes P n,l Z ( z ) > , the cardinality of I ( z ) must satisfy |I ( z ) | ≤ n − l . For any z such that P n,l Z ( z ) > , one can always ﬁnd an e x such that P n,l X ( e x ) > and I ( z ) ∩ supp ( e x ) = ∅ ; thus W ⊗ nZ | X ( z | e x ) =  Y j : e x j =1 P ( z j )   Y j : e x j =0 P ( z j )  ≥ ( µ ) l ( µ ) n − l ≥ e µ n , (49)where µ = min z : Q ( z ) > Q ( z ) , µ = min z : Q ( z ) > Q ( z ) , and e µ = min { µ , µ } . Then, we have ( |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z ) = ( |M| N ) min z : P n,l Z ( z ) > X x P n,l X ( x ) W ⊗ nZ | X ( z | x ) ≥ ( |M| N ) w l min z : P n,l Z ( z ) > X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) (50) ≥ min z : P n,l Z ( z ) > X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) (51) ≥ e µ n . (52)where (51) holds since |M| = exp { e r √ n } and w l = exp { Θ( √ n log n ) } , and (52) is true since we know from (49) that forevery z such that P n,l Z ( z ) > , one can ﬁnd an e x with P n,l X ( e x ) > to ensure X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) ≥ W ⊗ nZ | X ( z | e x ) ≥ e µ n . Thus, we have log  |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z )  ≤ log (1 + e µ n ) ≤ log ((1 + e µ ) n ) = n log (1 + e µ ) . (53)A PPENDIX BP ROOF OF L EMMA ζ > , and we decompose e P Y into two sub-distributions e P (1) Y and e P (2) Y such that e P (1) Y ( y ) , K K X i =1 W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) > ζ ) , e P (2) Y ( y ) , K K X i =1 W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) ≤ ζ ) . By noting that P Y ( y ) = E ( e P Y ( y )) , where the expectation is over the random generation of { x , . . . , x K } , we have E (cid:16) V (cid:16) P Y , e P Y (cid:17)(cid:17) = 12 E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P Y ( y ) (cid:17) − e P Y ( y ) (cid:12)(cid:12)(cid:12)! ≤ E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P (1) Y ( y ) (cid:17) − e P (1) Y ( y ) (cid:12)(cid:12)(cid:12)! + 12 E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:12)(cid:12)(cid:12)! . (54)The ﬁrst term of (54) is bounded from above by X y E (cid:16) e P (1) Y ( y ) (cid:17) = 1 K K X i =1 X y X x i P X ( x i ) W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) > ζ ) = P P X W ⊗ nY | X log W ⊗ nY | X ( Y | X ) P ′ Y ( Y ) > ζ ! . By applying Jensen’s inequality, the second term of (54) is bounded from above by X y E r(cid:16) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:17) ! ≤ X y s E (cid:20)(cid:16) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:17) (cid:21) = 12 X y r Var (cid:16) e P (2) Y ( y ) (cid:17) , (55)and one can further show thatVar (cid:16) e P (2) Y ( y ) (cid:17) = 1 K K X i =1 Var W ⊗ nY | X ( y | X i ) ( log W ⊗ nY | X ( y | X i ) P ′ Y ( y ) ≤ ζ )! ≤ K K X i =1 E W ⊗ nY | X ( y | X i ) ( log W ⊗ nY | X ( y | X i ) P ′ Y ( y ) ≤ ζ )! ≤ K K X i =1 E (cid:16) W ⊗ nY | X ( y | X i ) e ζ P ′ Y ( y ) (cid:17) = e ζ K P ′ Y ( y ) P Y ( y ) . By using the arithmetic-geometric mean inequality, we see that (55) is further bounded from above as X y r Var (cid:16) e P (2) Y ( y ) (cid:17) ≤ X y r e ζ K P ′ Y ( y ) P Y ( y ) ≤ r e ζ K X y P ′ Y ( y ) + P Y ( y )2= 12 r e ζ K . R EFERENCES[1] C. E. Shannon, “A mathematical theory of communication,”

The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948.[2] R. Ahlswede and G. Dueck, “Identiﬁcation via channels,”

IEEE Trans. Inf. Theory , vol. 35, no. 1, pp. 15–29, 1989.[3] T. S. Han and S. Verdú, “New results in the theory of identiﬁcation via channels,”

IEEE Trans. Inf. Theory , vol. 38, no. 1, pp. 14–25, 1992.[4] A. D. Wyner, “The wire-tap channel,”

Bell System Technical Journal , vol. 54, no. 8, pp. 1355–1387, 1975.[5] Y. Liang, H. V. Poor, and S. Shamai,

Information Theoretic Security . Now Publishers Inc, 2009.[6] M. Bloch and J. Barros,

Physical-Layer Security: From Information Theory to Security Engineering . Cambridge University Press, 2011.[7] R. Ahlswede and N. Cai, “Transmission, identiﬁcation and common randomness capacities for wire-tape channels with secure feedback from the decoder,”in

General Theory of Information Transfer and Combinatorics . Springer, 2006, pp. 258–275.[8] H. Boche and C. Deppe, “Secure identiﬁcation for wiretap channels; robustness, super-additivity and continuity,”

IEEE Trans. Inf. Forensics Secur. ,vol. 13, no. 7, pp. 1641–1655, 2018.[9] ——, “Secure identiﬁcation under passive eavesdroppers and active jamming attacks,”

IEEE Trans. Inf. Forensics Secur. , vol. 14, no. 2, pp. 472–485,2018.[10] B. A. Bash, D. Goeckel, and D. Towsley, “Limits of reliable communication with low probability of detection on AWGN channels,”

IEEE J. Sel. AreasCommun. , vol. 31, no. 9, pp. 1921–1930, 2013.[11] P. H. Che, M. Bakshi, and S. Jaggi, “Reliable deniable communication: Hiding messages in noise,” in

Proc. IEEE Int. Symp. Inf. Theory , 2013, pp.2945–2949.[12] P. H. Che, M. Bakshi, C. Chan, and S. Jaggi, “Reliable deniable communication with channel uncertainty,” in

Proc. IEEE Inform. Th. Workshop , 2014,pp. 30–34.[13] ——, “Reliable, deniable and hidable communication,” in

Proc. Inform. Th. Applic. Workshop , 2014, pp. 1–10.[14] M. R. Bloch, “Covert communication over noisy channels: A resolvability perspective,”

IEEE Trans. Inf. Theory , vol. 62, no. 5, pp. 2334–2354, 2016.[15] L. Wang, G. W. Wornell, and L. Zheng, “Fundamental limits of communication with low probability of detection,”

IEEE Trans. Inf. Theory , vol. 62,no. 6, pp. 3493–3503, 2016.[16] M. Tahmasbi and M. R. Bloch, “First-and second-order asymptotics in covert communication,”

IEEE Trans. Inf. Theory , vol. 65, no. 4, pp. 2190–2212,2018.[17] M. Tahmasbi, M. R. Bloch, and V. Y. F. Tan, “Error exponent for covert communications over discrete memoryless channels,” in

Proc. IEEE Inform.Th. Workshop , 2017, pp. 304–308.[18] K. S. K. Arumugam and M. R. Bloch, “Covert communication over a K-user multiple-access channel,”

IEEE Trans. Inf. Theory , vol. 65, no. 11, pp.7020–7044, 2019.[19] ——, “Embedding covert information in broadcast communications,”

IEEE Trans. Inf. Forensics Secur. , vol. 14, no. 10, pp. 2787–2801, 2019.[20] V. Y. F. Tan and S.-H. Lee, “Time-division is optimal for covert communication over some broadcast channels,”

IEEE Trans. Inf. Forensic Secur. , vol. 14,no. 5, pp. 1377–1389, 2018.[21] D. Kibloff, S. M. Perlaza, and L. Wang, “Embedding covert information on a given broadcast code,” in

Proc. IEEE Int. Symp. Inf. Theory , 2019, pp.2169–2173.[22] M. Ahmadipour, S. Salehkalaibar, M. H. Yassaee, and V. Y. F. Tan, “Covert communication over a compound discrete memoryless channel,” in

Proc.IEEE Int. Symp. Inf. Theory , 2019, pp. 982–986. [23] S.-H. Lee, L. Wang, A. Khisti, and G. W. Wornell, “Covert communication with channel-state information at the transmitter,” IEEE Trans. Inf. ForensicsSecur. , vol. 13, no. 9, pp. 2310–2319, 2018.[24] H. ZivariFard, M. Bloch, and A. Nosratinia, “Keyless covert communication in the presence of non-causal channel state information,” in

Proc. IEEEInform. Th. Workshop , 2019, pp. 1–5.[25] Q. Zhang, M. Bakshi, and S. Jaggi, “Covert communication over adversarially jammed channels,” in

Proc. IEEE Infom. Th. Workshop , 2018, pp. 1–5.[26] K. S. K. Arumugam, M. R. Bloch, and L. Wang, “Covert communication over a physically degraded relay channel with non-colluding wardens,” in

Proc. IEEE Int. Symp. Inf. Theory , 2018, pp. 766–770.[27] M. R. Bloch and S. Guha, “Optimal covert communications using pulse-position modulation,” in

Proc. IEEE Int. Symp. Inf. Theory , 2017, pp. 2825–2829.[28] Q. Zhang, M. R. Bloch, M. Bakshi, and S. Jaggi, “Undetectable radios: Covert communication under spectral mask constraints,” in

Proc. IEEE Int.Symp. Inf. Theory , 2019, pp. 992–996.[29] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in

The Collected Works of Wassily Hoeffding . Springer, 1994, pp.409–426.[30] J. Hou and G. Kramer, “Effective secrecy: Reliability, confusion and stealth,” in

Proc. IEEE Int. Symp. Inf. Theory , 2014, pp. 601–605.[31] T. S. Han and S. Verdú, “Approximation theory of output statistics,”

IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 752–772, 1993.[32] J. Hou and G. Kramer, “Informational divergence approximations to product distributions,” in

Proc. Canadian Workshop Inform. Theory , 2013, pp.76–81.[33] P. Cuff, “Distributed channel synthesis,”

IEEE Trans. Inf. Theory , vol. 59, no. 11, pp. 7071–7096, 2013.[34] T. S. Han,