Covert Identification over Binary-Input Discrete Memoryless Channels
aa r X i v : . [ c s . I T ] J u l Covert Identification overBinary-Input Memoryless Channels
Qiaosheng Zhang, Vincent Y. F. Tan,
Senior Member, IEEE
Abstract
This paper considers the covert identification problem in which a sender aims to reliably convey an identification (ID) messageto a set of receivers via a binary-input memoryless channel (BMC), and simultaneously to guarantee that the communication iscovert with respect to a warden who monitors the communication via another independent BMC. We prove a square-root law forthe covert identification problem. This states that an ID message of size exp(exp(Θ( √ n ))) can be transmitted over n channel uses.We then characterize the exact pre-constant in the Θ( · ) notation. This constant is referred to as the covert identification capacity.We show that it equals the recently developed covert capacity in the standard covert communication problem, and somewhatsurprisingly, the covert identification capacity can be achieved without any shared key between the sender and receivers. Theachievability proof relies on a random coding argument with pulse-position modulation (PPM), coupled with a second stage whichperforms code refinements. The converse proof relies on an expurgation argument as well as results for channel resolvability withstringent input constraints. Index Terms
Covert Communication, Identification via channels, Channel resolvability.
I. I
NTRODUCTION
In contrast to Shannon’s classical channel coding problem [1] (also known as the transmission problem ) in which a senderwishes to reliably send a message to a receiver through a noisy channel W , the problem of identification via channels [2](or simply the identification problem ) is rather different. It focuses on a different setting wherein a sender wishes to send an identification (ID) message m ∈ M via a noisy channel W to a set of receivers { R m ′ } m ′ ∈M , each observing the (same)outputs of the channel, such that every receiver R m ′ only cares about its dedicated message m ′ and should be able to reliablyanswer the following question: Is the ID message sent by the sender m ′ ? Specifically, if the ID message sent by the sender is m , • The receiver R m ′ should answer “YES” with high probability if m ′ = m ; • The receiver R m ′ should answer “NO” with high probability if m ′ = m .It is well known that in the transmission problem, one can reliably transmit a message of size exp(Θ( n )) over n channel uses,and the pre-constant is characterized by the celebrated channel capacity C W , max P I ( P, W ) , i.e., the mutual informationof the input and output of the channel W maximized over the input distribution P . In the identification problem, Ahlswedeand Dueck [2] showed that the size of the ID message can be as large as exp(exp(Θ( n ))) , i.e., doubly-exponentially large inthe blocklength n . Somewhat surprisingly, the exact pre-constant in the Θ( · ) notation, which is referred to as the identificationcapacity , is again C W [2], [3]. That is, the identification capacity exactly equals the channel capacity.Apart from reliability guarantees, recent years have witnessed an increasing attention to security concerns, especially innetworked communication systems such as the Internet of Things. From an information-theoretic perspective, the security ofthe classical transmission problem has been extensively studied since Wyner’s seminal paper [4] on the wiretap channel (see [5],[6] for surveys), and the secure identification problem has been investigated as well [7]–[9]. While most security problems areconcerned with hiding the content of information, in certain scenarios merely the fact that communication takes place couldlead to serious consequences—thus, the sender is required to hide the fact that he/she is communicating when he/she does so.Said differently, the sender needs to communicate covertly with respect to the warden who is surreptitiously monitoring thecommunication. This motivates the recent studies of the covert communication problem. Following the pioneering work byBash et al. [10] which demonstrates a square-root law (SRL) (i.e., one can only transmit Θ( √ n ) bits over n channel uses) forcovert communication, subsequent works have built on the initial work [10] to establish information-theoretic limits for covertcommunication over binary symmetric channels [11]–[13], discrete memoryless channels (DMCs) and Gaussian channels [14]–[17], multiple-access channels [18], broadcast channels [19]–[21], compound channels [22], channel with states [23], [24],adversarial noise channels [25], relay channels [26], etc . In the literature, the covertness constraint requires that at the warden’sside, the output distribution when communication takes place is almost indistinguishable from the output distribution when nocommunication takes place, and the discrepancy between the two distributions is usually measured by the Kullback-Leibler(KL) divergence or the variational distance . Qiaosheng Zhang is with the Department of Electrical and Computer Engineering, National University of Singapore (e-mail: [email protected]).Vincent Y. F. Tan is with the Department of Electrical and Computer Engineering and Department of Mathematics, National University of Singapore (e-mail:[email protected]).
In addition to covert communication which focuses on the transmission problem, there are also scenarios in which the senderwishes to reliably send an ID message to a set of receivers, and simultaneously to remain covert with respect to the warden.For instance, a commander would like to send a certain message to a set of subordinates M . This message informs exactlyone of them to be prepared to strike. Each of these subordinates m ′ ∈ M would like to know whether the message sent bythe commander m ∈ M corresponds to his specific index m ′ ; if so, he must be prepared, otherwise nothing needs to be doneon his part. Each subordinate is only interested in whether or not he should be ready. The commander has to send his messagein such a way that an enemy should not be able to infer that any communication is occurring. We refer to this problem asthe covert identification problem . Given the similarities and differences between the transmission and identification problemswithout covertness constraints, it is then natural to ask the following questions: (i) What is the maximum size of the ID messagewith covertness constraints , (ii)
Whether the covert capacity characterized in [14], [15] plays a role in the fundamental limitsof the covert identification problem , and (iii)
Is a shared key required to ensure that the identification can take place reliably?
These questions precisely set the stage of this work, and our main contributions can be summarized as follows. • Analogous to the SRL in the covert communication literature, a different form of the SRL is discovered in the covertidentification problem. That is, one can send an ID message of size up to exp(exp(Θ( √ n ))) reliably and covertly, incontrast to the standard identification problem wherein the scaling is exp(exp(Θ( n ))) . • We then characterize the maximal pre-constant of the Θ( · ) notation in exp(exp(Θ( √ n ))) , which is referred to as the covert identification capacity . We do so by establishing matching achievability and converse results. It turns out that thecovert identification capacity equals the covert capacity; however, a key difference is that the former is achieved withoutany shared key between the sender and receivers—this is in stark contrast to standard covert communication wherein theshared key is necessary [14] for achieving covert capacity in some regimes of the channel between the sender and receiverand the channel between the sender and the warden.From the achievability’s perspective, the requirement of a keyless identification code prevents us from adopting the simplestand most classical construction of identification codes proposed by Ahlswede and Dueck [2], which relies on the existenceof a capacity-achieving code for the transmission problem. This is because there does not exist a keyless covert-capacity-achieving transmission code for covert communication in general [14]. Therefore, we develop an identification code from firstprinciples. Our construction is based on a random coding argument with Pulse-Position Modulation (PPM) and a modifiedinformation density decoder. PPM, which can be viewed as a special sub-class of constant composition codes, was shownto be optimal for covert communication by Bloch and Guha [27]. PPM codes are also useful in this work. In addition, wehighlight that the random coding argument does not directly ensure the existence of a good identification code with vanishing maximum error probabilities, due to the large message size which is of order exp(exp(Θ( √ n ))) ; this issue is resolved bya careful code refinement process, which is explained in Section IV-D. Our code refinement process is different from theconventional expurgation argument that is ubiquitous in the information theory literature, in the sense that our refinementprocedure preserves the channel output distribution induced by the original code; this is critical for ensuring that the covertnessconstraint is satisfied.The proof of the converse part for the covert identification problem is also non-standard. Roughly speaking, the converse forchannel identification usually relies on the achievability for channel resolvability for general input distributions, as discoveredin Han and Verdú’s seminal work [3]. However, such general results have not been established under stringent input constraintsimposed by the covertness constraint. Instead, we circumvent this difficulty by expurgating a large number of codevectors ofthe original covert identification code such that the resultant expurgated code satisfies certain cost constraints, and one canthen apply the idea of [3] to the new code to obtain the desired converse result. It is also worth noting that the expurgationargument used in this work differs from some relevant works on covert communication [16], [25], [28], since the identificationproblem relies critically on the use of stochastic encoders (as detailed in Section III).The rest of this paper is organized as follows. We provide some notational conventions and an important technical lemma inSection II. In Section III, we formally introduce the covert identification problem and also present the main results. Sections IVand V respectively provide the detailed proofs of the achievability and converse parts for the main results. In Section VI, weconclude this work and propose several promising directions for future work.II. P RELIMINARIES
For non-negative integers a, b ∈ N , we use [ a : b ] to denote the set of integers { a, a + 1 , . . . , b } . Random variables and theirrealizations are respectively denoted by uppercase and lowercase letters, e.g., X and x . Sets are denoted by calligraphic letters,e.g., X . Vectors of length n are denoted by boldface letters, e.g., X or x , while vectors of shorter length (which should beclear from the context) are denoted by underlined boldface letters, e.g., X or x . We use X i or x i to denote the i -th elementof a vector, and X ba or x ba to denote the vector ( X a , X a +1 , . . . , X b ) or ( x a , x a +1 , . . . , x b ) . Throughout this paper, logarithms log and exponentials exp are based e . For two probability distributions P and Q over thesame finite set X , we respectively define their KL-divergence , variational distance , and χ -distance as D ( P k Q ) , X x ∈X P ( x ) log P ( x ) Q ( x ) , V ( P, Q ) , X x ∈X | P ( x ) − Q ( x ) | ,χ ( P k Q ) , X x ∈X ( P ( x ) − Q ( x )) Q ( x ) . We say P is absolutely continuous with respect to Q (denoted by P ≪ Q ) if the support of P is a subset of the support of Q (i.e., for all x ∈ X , P ( x ) = 0 if Q ( x ) = 0 ).Moreover, we introduce a concentration inequality that is widely used in this work—Hoeffding’s inequality. Lemma 1 (Hoeffding’s inequality [29]) . Suppose { X i } ni =1 is a set of independent random variables such that a i ≤ X i ≤ b i almost surely, and let X , P ni =1 X i . For any v > , P ( | X − E ( X ) | ≥ v ) ≤ exp (cid:18) − v P ni =1 ( b i − a i ) (cid:19) . III. P
ROBLEM S ETTING AND M AIN R ESULTS
The channel between the sender and receivers is a binary-input memoryless channel (BMC) ( X , W Y | X , Y ) , and the channelbetween the sender and warden is another independent BMC ( X , W Z | X , Z ) . It is assumed that Y and Z are finite alphabets, and X = { , } with ‘ ’ being the innocent symbol and ‘ ’ being the symbol that carries information. The channel transition prob-ability corresponding to n channel uses are denoted by W ⊗ nY | X ( y | x ) , Q ni =1 W Y | X ( y i | x i ) and W ⊗ nZ | X ( z | x ) , Q ni =1 W Z | X ( z i | x i ) .Moreover, we define P , W Y | X =0 , P , W Y | X =1 ,Q , W Z | X =0 , Q , W Z | X =1 . As is common in the covert communication literature, it is assumed that (i) Q = Q , (ii) Q is absolutely continuouswith respect to Q (i.e., Q ≪ Q ), and (iii) P is absolutely continuous with respect to P (i.e., P ≪ P ). The firsttwo assumptions preclude the scenarios in which covertness is always guaranteed or would never be guaranteed, while thelast assumption precludes the possibility that the receivers enjoy an unfair advantage over the warden (as detailed in [14,Appendix G]). Let µ , min z : Q ( z ) > Q ( z ) , µ , min z : Q ( z ) > Q ( z ) , and e µ , min { µ , µ } Definition 1 (Identification codes) . An identification code C with message set M is a collection of codewords { U m } m ∈M anddecoding regions {D m } m ∈M , where U m ∈ P ( X n ) and D m ⊆ Y n . Remark 1.
In contrast to most communication problems wherein each message m is deterministically mapped to a fixedsequence (the codeword) x ∈ X n , the identification problem uses stochastic encoders such that message m is stochasticallymapped to a random sequence X according to the probability distribution U m ∈ P ( X n ) . Moreover, the decoding regions {D m } m ∈M in the identification problem are not necessarily disjoint. The use of stochastic encoders and the fact that thedecoding regions are not disjoint are critical for communicating ω ( n ) bits of message over n channel uses. With a slight abuseof terminology, we refer to the distribution U m as the codeword for the message m . The tranmission status of the sender is denoted by T ∈ { , } . Communication takes place if T = 1 , while no communicationtakes place if T = 0 . When T = 1 , the sender selects a message m uniformly at random from M . The encoder then choosesa length- n sequence X ∈ X n according to the distribution U m . When T = 0 , the channel input is the length- n zero sequence . For the receiver R m ′ ( m ′ ∈ M ), upon receiving the channel output Y ∈ Y n through the BMC W ⊗ nY | X , it declares that themessage sent by the sender is m ′ if and only if Y ∈ D m ′ .The standard identification problem usually focuses on two types of error— the error probability of the first kind whichcorresponds to the probability that the true message is not identified by its designated receiver, and the error probability ofthe second kind which corresponds to the probability that the message is wrongly identified by some other receiver. For thecovert identification problem, we introduce one more type of error— the error probability of the third kind which correspondsto the probability that the length- n zero sequence (when no communication takes place) is wrongly identified as a certainmessage by any receiver. We formalize these notions in the following definition. It is also possible to consider a more general setting with multiple non-zero input symbols (by following the lead of [15]); however, for simplicity andease of presentation, we focus on the binary-input setting in this work.
Definition 2 (Error probabilities) . When T = 1 and m ∈ M is sent, the error probability of the first kind is defined as P (1)err ( m ) , X x ∈X n U m ( x ) W ⊗ nY | X ( D cm | x ) . When T = 1 and m ′ ∈ M is sent, the error probability of the second kind corresponding to the receiver R m is defined as P (2)err ( m, m ′ ) , X x ∈X n U m ′ ( x ) W ⊗ nY | X ( D m | x ) . When T = 0 and the length- n zero sequence is sent through the channel, the error probability of the third kind correspondingto the receiver R m is defined as P (3)err ( m ) , P ⊗ n ( D m ) . Furthermore, let the corresponding maximum error probabilities (over all the messages or all pairs of distinct messages) be P (1)err , max m ∈M P (1)err ( m ) ,P (2)err , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ,P (3)err , max m ∈M P (3)err ( m ) Let b Q n C ( z ) be the output distribution on Z n for the warden induced by the identification code, which takes the form b Q n C ( z ) , |M| X m ∈M X x ∈X n U m ( x ) W ⊗ nZ | X ( z | x ) , ∀ z ∈ Z n . (1)We adopt the widely-used KL-divergence metric D ( b Q n C k Q ⊗ n ) to measure covertness with respect to the warden. Definition 3 (Covertness) . The communication is δ -covert if the KL-divergence between the distribution b Q n C (when T = 1 )and Q ⊗ n (when T = 0 ) is bounded from above by δ , i.e., D ( b Q n C k Q ⊗ n ) ≤ δ. Let π | and π | respectively be the probabilities of false alarm (i.e., making an error when T = 0 ) and missed detection(i.e., making an error when T = 1 ) of the warden’s hypothesis test. By using the definition of the variational distance andPinsker’s inequality, we see that the optimal test satisfies π | + π | = 1 − V ( b Q n C , Q ⊗ n ) ≥ − q D ( b Q n C k Q ⊗ n ) . Thus, a small D ( b Q n C k Q ⊗ n ) implies a large sum-error π | + π | . This provides an operational meaning of the covertness metricin Definition 3. As discussed in prior works such as [15], [16], [30], the variational distance metric V ( b Q n C , Q ⊗ n ) is perhaps abetter metric under the specific assumption that T = 0 and T = 1 occur with equal probabilities, since it directly connects tothe average error probability of detection; however, the above assumption does not hold in general, thus both KL-divergenceand variational distance are deemed to be appropriate metrics in the literature. Definition 4.
A rate R is said to be δ -achievable if there exists a sequence of identification codes with increasing blocklength n such that lim inf n →∞ log log |M|√ n ≥ R, D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ, lim n →∞ P (1)err = lim n →∞ P (2)err = lim n →∞ P (3)err = 0 . The δ -covert identification capacity C δ is defined as the supremum of all δ -achievable rates. Note that the coding rate R in the covert identification problem is defined as the iterated logarithm of the size of the messageset |M| normalized by √ n , which implies that the message size (if R > ) is of order exp(exp(Θ( √ n ))) . This intuitivelymakes sense because the channel identification problem usually allows the message size to be as large as exp(exp(Θ( n ))) , butthe stringent covertness constraint reduces the exponent from Θ( n ) to Θ( √ n ) . In the following, we present the main resultthat characterizes the δ -covert identification capacity of BMCs. Main result: The covert identification capacity
Theorem 1.
For any BMCs W Y | X and W Z | X satisfying Q = Q , Q ≪ Q , and P ≪ P , the δ -covert identification capacityis given by C δ = s δχ ( Q k Q ) D ( P k P ) . Some remarks are in order.1) Analogous to the canonical covert communication problem, we notice that the SRL also holds for the covert identificationproblem albeit with message size exp(exp(Θ( √ n ))) . Furthermore, the δ -covert identification capacity is exactly the sameas the δ -covert capacity derived in [14], [15].2) In stark contrast to the standard covert communication problem [14] in which a shared key is needed to achieve the covertcapacity when the channels W Y | X and W Z | X satisfy D ( P k P ) ≤ D ( Q k Q ) , Theorem 1 above shows that regardlessof the values of D ( P k P ) and D ( Q k Q ) , the δ -covert identification capacity is always achievable without any sharedkey . Intuitively, this is because the message size in our setting scales as exp( ω ( n )) , which automatically allows us tosatisfy the requirements on the shared key via proof techniques from channel resolvability [31] since it is well knownthat an exponential message size (of a suitably large exponent) suffices to drive the approximation error (of the target andsynthesized distributions) to zero. This is reflected in Lemma 5 in our achievability proof.We prove the achievability part of Theorem 1 in Section IV, and the converse part in Section V.IV. A CHIEVABILITY
The achievability proof is partitioned into two stages. In the first stage, we use a random coding argument with a PPMinput distribution and a modified information density decoder to show the existence of a “weak” covert identification code.By “weak” we mean that this stage only guarantees that the average (rather than maximum ) error probability of the thirdkind vanishes. In the second stage, we apply a code refinement process to the “weak” covert identification code, such that therefined code satisfies all the criteria for the three error probabilities and covertness in Definition 4.We first provide a detailed introduction of PPM in Subsection IV-A. The first stage of the achievability is described inSubsection IV-B and proved in Subsection IV-C, while the second stage is presented in Subsection IV-D.
A. Pulse Position Modulation (PPM)
Let l , $s (2 δ − n − / ) nχ ( Q k Q ) % be the weight parameter , and ( w, s ) be non-negative integers such that w , ⌊ n/l ⌋ and s , n − wl . We use x ∈ X w , y ∈Y w , z ∈ Z w to denote vectors of length w . We also let wt H ( x ) denote the number of ones, or the weight , of the vector x . Let P w X ( x ) , ( /w, if wt H ( x ) = 1 , , otherwise , be the distribution on X w such that P w X ( x ) is non-zero if and only if x has Hamming weight one. The corresponding outputdistributions P w Y and P w Z are respectively given by P w Y ( y ) , X x ∈X w P w X ( x ) W ⊗ wY | X ( y | x ) , and (2) P w Z ( z ) , X x ∈X w P w X ( x ) W ⊗ wY | X ( z | x ) . (3)For each i ∈ [1 : l ] , we define the length- w vector x ( i ) , x iw ( i − w +1 . Thus, every length- n vector x can be represented as x = [ x (1) , . . . , x ( l ) , x nwl +1 ] , where x nwl +1 is of length s . The PPM input distribution is thus defined as P n,l X ( x ) , l Y i =1 P w X ( x ( i ) ) · (cid:8) wt H ( x nwl +1 ) = 0 (cid:9) . That is, we require each PPM-generated vector, also called a
PPM-sequence , x to contain exactly l ones; in particular, eachof the first l intervals [1 : w ] , [ w + 1 : 2 w ] , . . . , [( l − w + 1 : lw ] contains a single one, and the last interval [ wl + 1 : n ] contains all zeros. B. Existence of a “weak” covert identification code1) Encoder and Decoder:
Let η ∈ (0 , be arbitrary, the normalized weight parameter t , l/ √ n , r = (1 − η ) t D ( P k P ) ,and r ′ = (1 − ( η/ t D ( P k P ) . The size of the message set |M| = exp( e r √ n ) . For each message m ∈ M , we generate N , e r ′ √ n sequences { x m,i } Ni =1 independently according to P n,l X , and the codeword U m is the uniform distribution over theset { x m,i } Ni =1 , i.e., U m ( x ) , N N X i =1 { x = x m,i } , ∀ x ∈ X n . That is, we send each of the sequences { x m,i } Ni =1 with equal probability when m is the true message.Let γ , (1 − ǫ ) t D ( P k P ) , where < ǫ < η/ . To specify the decoding region D m for each message m ∈ M , we firstdefine the set F x for each x ∈ X n as F x , ( y ∈ Y n : log W ⊗ nY | X ( y | x ) P ⊗ ( y ) > γ √ n ) . The decoding region for each m is D m , ∪ i ∈ [1: N ] F x m,i .
2) Error probabilities and distributions of interest:
Based on the encoding scheme described above, the error probabilitiesof the first and second kinds can be rewritten as P (1)err ( m ) = X x ∈X n N N X i =1 { x = x m,i } W ⊗ nY | X ( D cm | x ) = 1 N N X i =1 W ⊗ nY | X ( D cm | x m,i ) , and (4) P (2)err ( m, m ′ ) = 1 N N X i =1 W ⊗ nY | X ( D m | x m ′ ,i ) . (5)Consider the PPM distribution P n,l X on X n , the corresponding output distributions P n,l Y on Y n and P n,l Z on Z n are respectivelygiven by P n,l Y ( y ) , X x ∈X n P n,l X ( x ) W ⊗ nY | X ( y | x )= X x ∈X n l Y i =1 P w X ( x ( i ) ) (cid:8) wt H ( x nwl +1 ) = 0 (cid:9) W ⊗ nY | X ( y | x )= l Y i =1 P w Y ( y ( i ) ) ! · P ⊗ s ( y nwl +1 ) , and (6) P n,l Y ( z ) , X x ∈X n P n,l X ( x ) W ⊗ nZ | X ( z | x ) = l Y i =1 P w Z ( z ( i ) ) ! · Q ⊗ s ( z nwl +1 ) , where (6) follows from (2). Given the sequences { x m,i } Ni =1 for each m ∈ M , we can also rewrite the output distribution b Q n C on Z n , which is first defined in (1), as b Q n C ( z ) = 1 |M| X m ∈M N N X i =1 W ⊗ nZ | X ( z | x m,i ) .
3) Performance guarantees:
Lemma 2 below shows that with high probability, the randomly generated identification codeis a “weak” covert identification code, in the sense that it only has a vanishing average (and not maximum) error probabilityof the third kind.
Lemma 2.
There exist vanishing sequences κ n , ε (1) n , ε (2) n , ε (3) n > (depending on the channels W Y | X , W Z | X and the covertnessparameter δ ) such that with probability at least − κ n over the code generation process, the randomly generated code satisfies max m ∈M P (1)err ( m ) ≤ ε (1) n , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ≤ ε (2) n , |M| X m ∈M P (3)err ( m ) ≤ ε (3) n , D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. C. Proof of Lemma 21) Analysis of P (1)err : Consider a fixed message m ∈ M . By recalling Eqn. (4) and noting that D cm ⊆ F c x m,i , we have P (1)err ( m ) = 1 N N X i =1 W ⊗ nY | X ( D cm | x m,i ) ≤ N N X i =1 W ⊗ nY | X ( F c x m,i | x m,i ) . (7)Note that each x m,i is generated according to P n,l X , and E P n,l X (cid:16) W ⊗ nY | X ( F c X | X ) (cid:17) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) ( log W ⊗ nY | X ( y | x ) P ⊗ ( y ) ≤ γ √ n ) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) n X j =1 log W Y | X ( y j | x j ) P ( y j ) ≤ γ √ n = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) X j : x j =1 log P ( y j ) P ( y j ) ≤ γ √ n , (8)where (8) holds since log W Y | X ( y j | x j ) P ( y j ) = log P ( y j ) P ( y j ) = 0 for all j such that x j = 0 . Without loss of generality, we define x ∗ ∈ X n as the weight- l vector such that x ∗ ( j − w +1 = 1 for j ∈ [1 : l ] , thus (8) also equals X y W ⊗ nY | X ( y | x ∗ ) l X j =1 log P ( y ( j − w +1 ) P ( y ( j − w +1 ) ≤ γ √ n = P P ⊗ l l X j =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ≤ γ √ n . Note that the random variables { log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) } j ∈ [1: l ] are independent and bounded, E ( P lj =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ) = l D ( P k P ) ,and γ √ n , (1 − ǫ ) l D ( P k P ) . By applying Hoeffding’s inequality (Lemma 1), we have P P ⊗ l l X j =1 log P ( Y ( j − w +1 ) P ( Y ( j − w +1 ) ≤ γ √ n ≤ e − c √ n , (9)for some constant c > .Let µ be a constant satisfying < µ < min { r ′ − r, γ − r ′ } , β n , e − c √ n , and α n , max { β n , e − µ √ n/ } . Considerthe N independent and identically distributed (i.i.d.) random variables { W ⊗ nY | X ( F c X m,i | X m,i ) } i ∈ [1: N ] which correspond to theright-hand side (RHS) of (7). Note that each random variable belongs to [0 , , and the expectation is at most β n accordingto (9). By applying Hoeffding’s inequality again and noting that α n − β n ≥ e − µ √ n/ / , we have P N N X i =1 W ⊗ nY | X ( F c X m,i | X m,i ) ≥ α n ! ≤ exp (cid:8) − N ( α n − β n ) (cid:9) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . Therefore, a union bound over all the messages m ∈ M yields P (cid:18) max m ∈M P (1)err ≥ α n (cid:19) = P ∃ m ∈ M : 1 N N X i =1 W ⊗ nY | X ( D cm | X m,i ) ≥ α n ! ≤ X m ∈M P N N X i =1 W ⊗ nY | X ( F X m,i | X m,i ) ≥ α n ! = exp (cid:26) − e ( r ′ − µ ) √ n + e r √ n (cid:27) , which vanishes since the choice of µ ensures r ′ − µ > r .
2) Analysis of P (2)err : Consider a fixed message pair ( m, m ′ ) ∈ M . Recall from Eqn. (5) that P (2)err ( m, m ′ ) = 1 N N X i =1 W ⊗ nY | X ( D m | x m ′ ,i ) . (10)Suppose the set of PPM-sequences { x m,j } j ∈ [1: N ] (i.e., P n,l X ( x m,j ) = 0 ) for message m is fixed, thus the decoding region D m is also fixed. Since D m = ∪ j ∈ [1: N ] F x m,j , we have E P n,l X (cid:16) W ⊗ nY | X ( D m | X ) (cid:17) ≤ N X j =1 E P n,l X (cid:16) W ⊗ nY | X ( F x m,j | X ) (cid:17) . (11) Lemma 3.
Let ξ , P y ∈Y P ( y ) P ( y ) . For any PPM-sequence e x ∈ X n , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) ≤ exp (cid:8) − γ √ n + l ( ξ − /w (cid:9) . (12) Proof of Lemma 3.
For any PPM-sequence e x ∈ X n , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) = X x P n,l X ( x ) X y W ⊗ nY | X ( y | x ) ( log W ⊗ nY | X ( y | e x ) P ⊗ n ( y ) > γ √ n ) = X y P n,l Y ( y ) P ⊗ n ( y ) P ⊗ n ( y ) ( log W ⊗ nY | X ( y | e x ) P ⊗ n ( y ) > γ √ n ) ≤ e − γ √ n X y P n,l Y ( y ) P ⊗ n ( y ) W ⊗ nY | X ( y | e x ) (13) = e − γ √ n X y (cid:16)Q li =1 P w Y ( y ( i ) ) (cid:17) · P ⊗ s ( y nwl +1 ) P ⊗ n ( y ) W ⊗ nY | X ( y | e x ) (14) = e − γ √ n l Y i =1 X y ( i ) P w Y ( y ( i ) ) P ⊗ w ( y ( i ) ) W ⊗ wY | X ( y ( i ) | e x ( i ) ) X y nwl +1 P ⊗ s ( y nwl +1 ) P ⊗ s ( y nwl +1 ) P ⊗ s ( y nwl +1 ) , (15)where (13) holds since we only consider y that satisfies log (cid:16) W ⊗ nY | X ( y | e x ) /P ⊗ n ( y ) (cid:17) > γ √ n , and (14) follows from (6). Withoutloss of generality, we consider the first interval [1 : w ] such that y (1) = [ y , . . . , y w ] and e x (1) = [ e x , . . . , e x w ] , and by symmetrywe further assume e x = 1 and e x j = 0 for j ∈ [2 : w ] . Thus, X y (1) P w Y ( y (1) ) P ⊗ w ( y (1) ) W ⊗ wY | X ( y (1) | e x (1) ) = X y (1) P w Y ( y (1) ) P ⊗ w ( y (1) ) P ( y ) w Y j =2 P ( y j ) = X y ( P w Y ) ( y ) P ( y ) P ( y ) (16) = X y w P ( y ) + w − w P ( y ) P ( y ) P ( y )= 1 + 1 w ( ξ − , (17)where ( P w Y ) in (16) stands for the marginal distribution of P w Y which takes the form w P + w − w P . Combining (15) and (17)and applying the inequality log(1 + x ) ≤ x , we have E P n,l X (cid:16) W ⊗ nY | X ( F e x | X ) (cid:17) ≤ e − γ √ n (cid:18) w ( ξ − (cid:19) l ≤ e − γ √ n e l ( ξ − /w , which completes the proof.Combining (11) and Lemma 3, we obtain that the expectation of the random variable W ⊗ nY | X ( D m | X ) is bounded from aboveas E P n,l X (cid:16) W ⊗ nY | X ( D m | X ) (cid:17) ≤ exp (cid:26) − ( γ − r ′ ) √ n + l ( ξ − w (cid:27) , β ′ n , (18)which vanishes since r ′ < γ . Let α ′ n , max { β ′ n , e − µ √ n/ } , and note that α ′ n − β ′ n ≥ e − µ √ n/ / . Consider the N i.i.d.random variables { W ⊗ nY | X ( D m | X m ′ ,i ) } i ∈ [1: N ] which are present in the RHS of (10). Note that each random variable belongsto [0 , , and the expectation is at most β ′ n . By applying Hoeffding’s inequality, we have that for fixed D m , P { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! ≤ exp (cid:8) − N ( α ′ n − β ′ n ) (cid:9) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . (19) Note that (19) is true for any fixed D m (or equivalently, any fixed { x m,j } j ∈ [1: N ] ) that corresponds to message m . Next, wealso take the randomness of { X m,j } j ∈ [1: N ] into consideration. Let D m be the chance variable corresponding to D m , and wehave P { X m,i } , { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! = X D m P { X m,i } ( D m = D m ) P { X m ′ ,i } N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! (20) ≤ exp (cid:26) − e ( r ′ − µ ) √ n (cid:27) . (21)Finally, a union bound over all the message pairs ( m, m ′ ) ∈ M yields P (cid:18) max ( m,m ′ ) ∈M : m = m ′ P (2)err ≥ α ′ n (cid:19) ≤ X ( m,m ′ ) ∈M : m = m ′ P N N X i =1 W ⊗ nY | X ( D m | X m ′ ,i ) ≥ α ′ n ! = exp (cid:26) − e ( r ′ − µ ) √ n + 2 e r √ n (cid:27) , which vanishes since the choice of µ ensures r ′ − µ > r . Remark 2.
The analysis of P (2)err relies on the fact that every PPM-sequence has the same weight. Specifically, the crux of ourproof is to first bound the error term N P Ni =1 W ⊗ nY | X ( D m | X m ′ ,i ) with respect to a fixed realization D m of the chance variable D m (or equivalently, a fixed realization { x m,j } Nj =1 for message m ). Next, we take an expectation over D m , as reflectedin (20) . Using PPM ensures that for every realization { x m,j } Nj =1 , each element x m,j is a weight- l PPM-sequence that satisfiesinequality (12) of Lemma 3. Following Lemma 3 and the analysis starting from (18) , it is shown that for every realization D m ,one can derive the same upper bound on the error probability as presented in (19) ; thus it becomes straightforward to takean expectation over D m to obtain (21) from (20) . In contrast, if a coding scheme in which the weight of each x m,j is randomwere used (which would be the case if each component of x m,j was generated in an i.i.d. manner), it would require moreeffort to analyze the chance variable D m since the upper bound on the error probability in (19) depends on each realization of D m . In fact, the proof technique to upper bound P (2)err is also applicable to any constant composition code, i.e., not restrictedto PPM codes. The reason why we adopt PPM is that it makes the proof of covertness easier, since, as shown in Lemma 4 tofollow, the PPM-induced output distribution P n,l Z possess favorable covertness properties.3) Analysis of P (3)err : For a fixed message m ∈ M , the error probability of the third kind is bounded from above as P (3)err ( m ) = P ⊗ n ( D m ) ≤ N X i =1 P ⊗ n ( F x m,i ) , and the expected value of this error probability (averaged over the generation of { X m,i } i ∈ [1: N ] ) is bounded from above as E { X m,i } (cid:16) P (3)err ( m ) (cid:17) ≤ N X i =1 E { X m,i } (cid:0) P ⊗ n ( F X m,i ) (cid:1) = e r ′ √ n X x P n,l X ( x ) X y P ⊗ n ( y ) ( log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > γ √ n ) ≤ e r ′ √ n · e − γ √ n X x X y P n,l X ( x ) W ⊗ nY | X ( y | x ) ≤ e − ( γ − r ′ ) √ n . Thus, the expected value of the average error probability of the third kind satisfies E |M| X m ∈M P (3)err ( m ) ! ≤ e − ( γ − r ′ ) √ n . By applying Markov’s inequality, we have P |M| X m ∈M P (3)err ( m ) ≥ e − ( γ − r ′ − µ ) √ n ! ≤ e − µ √ n . That is, with probability at least − e − µ √ n over the random code selection, the average error probability of the third kind |M| − P m ∈M P (3)err ( m ) ≤ e − ( γ − r ′ − µ ) √ n , which tends to zero as n tends to infinity since the choice of µ ensures that γ − r ′ > µ .
4) Analysis of covertness:
First note that the KL-divergence D (cid:16) b Q n C k Q ⊗ n (cid:17) = D (cid:16) P n,l Z k Q ⊗ n (cid:17) + D (cid:16) b Q n C k P n,l Z (cid:17) + X z (cid:16) b Q n C ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) . (22)In the following, we upper bound the three terms on the RHS of (22) in Lemmas 4 and 5. Lemma 4.
For sufficiently large n , the KL-divergence D ( P n,l Z k Q ⊗ n ) ≤ δ − n − / . Proof of Lemma 4.
The proof is essentially due to [27, Lemma 1] and [16, Lemma 8], which analyze the output statistics ofthe PPM distribution and state that D (cid:16) P n,l Z k Q ⊗ n (cid:17) ≤ l n χ ( Q k Q ) + O (cid:18) √ n (cid:19) . Substituting l = ⌊ p (2 δ − n − / ) n/χ ( Q k Q ) ⌋ , we complete the proof. Lemma 5.
There exist constant c , c > such that with probability at least − exp( − c √ n ) over the random code design,the output distribution b Q n C induced by C ensures D (cid:16) b Q n C k P n,l Z (cid:17) ≤ exp {− c √ n } , and X z (cid:16) b Q n ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) ≤ n (cid:18) log 1 µ (cid:19) exp {− c √ n/ } . Proof of Lemma 5.
Recall that b Q n C is the output distribution induced by the set ∪ m ∈M { X m,i } i ∈ [1: N ] with each sequence beinggenerated i.i.d. according to P n,l X . We first borrow a result from [32, Eq. (10)] and [16, Eq. (81)] which states that E (cid:16) D (cid:16) b Q n C k P n,l Z (cid:17)(cid:17) ≤ E P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) |M| N P n,l Z ( Z ) !! . (23)Let τ , t D ( Q k Q ) and B τ , n ( x , z ) : log( W ⊗ nZ | X ( z | x ) /Q ⊗ n ( z )) < τ √ n o . Then, by partitioning ( x , z ) into ( x , z ) ∈ B τ and ( x , z ) / ∈ B τ , the term in (23) can be expressed as X ( x , z ) ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) log W ⊗ nZ | X ( z | x ) |M| N Q ⊗ n ( z ) Q ⊗ n ( z ) P n,l Z ( z ) ! + X ( x , z ) / ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) log W ⊗ nZ | X ( z | x ) |M| N P n,l Z ( z ) ! . (24)The first term of (24) is bounded from above by e τ √ n |M| N X ( x , z ) ∈B τ P n,l X ( x ) W ⊗ nZ | X ( z | x ) Q ⊗ n ( z ) P n,l Z ( z ) ≤ e τ √ n |M| N , (25)and the second term of (24) is bounded from above by log |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z ) × P P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) Q ⊗ n ( Z ) ≥ τ √ n ! . (26)Before we state the next lemma, we recall that µ = min z : Q ( z ) > Q ( z ) , µ = min z : Q ( z ) > Q ( z ) , and e µ = min { µ , µ } . Lemma 6.
We have log |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z ) ≤ n log (1 + e µ ) . We defer the proof of Lemma 6 to Appendix A. It then remains to consider the other term in (26). P P n,l X W ⊗ nZ | X log W ⊗ nZ | X ( Z | X ) Q ⊗ n ( Z ) ≥ τ √ n ! = X x P n,l X ( x ) X z W ⊗ nZ | X ( z | x ) ( log W ⊗ nZ | X ( z | x ) Q ⊗ n ( z ) ≥ τ √ n ) = X z W ⊗ nZ | X ( z | x ∗ ) n X j =1 log W Z | X ( z i | x ∗ j ) Q ( z j ) ≥ τ √ n (27) = P Q ⊗ l l X j =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ≥ τ √ n , (28)where (27) is due to symmetry and recall that x ∗ is the weight- l vector such that x ∗ ( j − w +1 = 1 for j ∈ [1 : l ] . By noting that E ( P lj =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ) = l D ( Q k Q ) and τ √ n , l D ( Q k Q ) , applying Hoeffding’s inequality yields P Q ⊗ l l X j =1 log Q ( Z ( j − w +1 ) Q ( Z ( j − w +1 ) ≥ τ √ n ≤ e − c √ n (29)for some constant c > . Combining (23), (25), (26), (29), the fact that |M| = exp { e r √ n } , and applying the Markov’sinequality, we obtain that there exist constants c , c > such that with probability at least − exp( − c √ n ) over the codedesign, D (cid:16) b Q n C k P n,l Z (cid:17) ≤ exp {− c √ n } . Finally, by Pinsker’s inequality, we know that V ( b Q n , P n,l Z ) ≤ q D ( b Q n k P n,l Z ) ≤ exp {− c √ n/ } , thus X z (cid:16) b Q n ( z ) − P n,l Z ( z ) (cid:17) log P n,l Z ( z ) Q ⊗ n ( z ) ≤ n (cid:18) log 1 µ (cid:19) · V ( b Q n , P n,l Z ) ≤ n (cid:18) log 1 µ (cid:19) exp {− c √ n/ } . This completes the proof of Lemma 5.Combining (22) and Lemmas 4 and 5, we conclude that with probability at least − exp( − c √ n ) over the random code C ,we have D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ for sufficiently large n . D. Code refinements
In the following, we refine a given “weak” covert identification code such that the refined code satisfies the error criteriaand covertness property in Definition 4 and simultaneously retains the rate of the original code.
Lemma 7.
Let δ > and ε (1) n , ε (2) n , ε (3) n > be vanishing sequences. Suppose there exists a sequence of codes C (of size |M| )satisfying max m ∈M P (1)err ( m ) ≤ ε (1) n , max ( m,m ′ ) ∈M : m = m ′ P (2)err ( m, m ′ ) ≤ ε (2) n , |M| X m ∈M P (3)err ( m ) ≤ ε (3) n , D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. Then, there exist vanishing sequences e ε (1) n , e ε (2) n , e ε (3) n > (depending on ε (1) n , ε (2) n , ε (3) n ) and another sequence of codes e C ofsize | f M| ≥ (1 − e ε (3) n ) |M| such that max m ∈ f M P (1)err ( m ) ≤ e ε (1) n , max ( m,m ′ ) ∈ f M : m = m ′ P (2)err ( m, m ′ ) ≤ e ε (2) n , max m ∈ f M P (3)err ( m ) ≤ e ε (3) n , D (cid:16) b Q n e C k Q ⊗ n (cid:17) ≤ δ. Proof of Lemma 7.
We first partition the messages in C into two disjoint sets. Definition 5.
Consider a code C that satisifes |M| P m ∈M P (3)err ( m ) ≤ ε (3) n . We say a message m ∈ M is a good message if P (3)err ( m ) ≤ ( ε (3) n ) / , and a bad message otherwise. Let f M ⊂ M be the set that contains all the good messages, and f M c be the set that contains all the bad messages. Without lossof generality, we assume f M = [1 : | f M| ] and f M c = [ | f M| + 1 : |M| ] . Since the code C satisfies P m ∈M P (3)err ( m ) ≤ ε (3) n |M| ,the number of bad messages is at most ( ε (3) n ) / |M| , i.e., | f M c | ≤ ( ε (3) n ) / |M| and | f M| ≥ (1 − ( ε (3) n ) / ) |M| . Recall that for each message m ∈ M , the corresponding set of sequences is { x m,i } i ∈ [1: N ] . We then denote the set of sequencesthat correspond to all the bad messages by B , ∪ m ∈ f M c { x m,i } i ∈ [1: N ] , and note that |B| ≤ N · ( ε (3) n ) / |M| . In the following, we construct a new code e C that contains | f M| messages.1) We partition the set B into | f M| equal-sized disjoint subsets B (1) , B (2) , . . . , B ( | f M| ) such that the cardinality of each subset(for m ∈ f M ) satisfies |B ( m ) | = |B|| f M| ≤ N · ( ε (3) n ) / |M| (1 − ( ε (3) n ) / ) |M| , ν n N, (30)where ν n also tends to 0 as n tends to infinity.2) For each m ∈ f M , the corresponding set of sequences in the original code C is { x m,i } i ∈ [1: N ] . In the new code e C , weenlarge this set by appending B ( m ) to { x m,i } i ∈ [1: N ] . Thus, the codeword U m is the uniform distribution over a larger setof sequences { x m,i } i ∈ [1: N ] ∪ B ( m ) .3) For each m ∈ f M , the decoding region of the new code e C remains as D m = ∪ i ∈ [1: N ] F x m,i . That is, the decoding regionsof the new code e C and the original code C are exactly the same.We now analyze the error probabilities of the new code e C . For each m ∈ f M , the error probability of the first kind is boundedfrom above as P (1)err ( m ) = P Ni =1 W ⊗ nY | X ( D cm | x m,i ) + P x ∈B ( m ) W ⊗ nY | X ( D cm | x ) N + |B ( m ) |≤ NN + |B ( m ) | N N X i =1 W ⊗ nY | X ( D cm | x m,i ) ! + |B ( m ) | N + |B ( m ) | (31) ≤ ε (1) n + ν n , (32)where (31) holds since W ⊗ nY | X ( D cm | x ) ≤ , and (32) is due to (30) and the assumption that the original code satisfies N P Ni =1 W ⊗ nY | X ( D cm | x m,i ) ≤ ε (1) n . Similarly, for each message pair ( m, m ′ ) ∈ f M , the error probability of the second kind P (2)err ( m, m ′ ) is bounded from above as P (2)err ( m, m ′ ) = P Ni =1 W ⊗ nY | X ( D m | x m ′ ,i ) + P x ∈B ( m ′ ) W ⊗ nY | X ( D cm | x ) N + |B ( m ′ ) |≤ NN + |B ( m ′ ) | N N X i =1 W ⊗ nY | X ( D cm | x m ′ ,i ) ! + |B ( m ′ ) | N + |B ( m ′ ) |≤ ε (2) n + ν n . Since all the messages in f M are good messages, by Definition 5 we have that for each message m ∈ f M , P (3)err ( m ) ≤ ( ε (3) n ) / . Finally, note that when constructing e C , we merely rearrange the sequences of C (rather than expurgate or add any sequences);thus, the output distribution induced by e C is exactly the same as that induced by C , i.e., D (cid:16) b Q n e C k Q ⊗ n (cid:17) = D (cid:16) b Q n C k Q ⊗ n (cid:17) ≤ δ. Thus, the covertness constraint is satisfied. Finally, we note that lim inf n →∞ log log | f M|√ n = lim inf n →∞ log log |M|√ n = (1 − η ) C δ , and the proof is completed by taking η → + . V. C
ONVERSE
In this section, we show that any sequence of identification codes with size |M| that simultaneously guarantees that D ( b Q n C k Q ⊗ n ) ≤ δ and P (1)err = λ (1) n , P (2)err = λ (2) n , P (3)err = λ (3) n (where lim n →∞ λ (1) n = lim n →∞ λ (2) n = lim n →∞ λ (3) n = 0 )must satisfy lim sup n →∞ log log |M|√ n ≤ C δ . Lemma 8.
Consider any identification code C with message set M , codewords { U m } m ∈M , and decoding regions {D m } m ∈M such that D ( b Q n C k Q ⊗ n ) ≤ δ . Let f H ( m ) , n P x U m ( x )wt H ( x ) be the fractional Hamming weight for each message m ∈ M .Then, there exists a constant c > such that the average fractional Hamming weight of C satisfies |M| X m ∈M f H ( m ) ≤ s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) . (33) Proof of Lemma 8.
We denote the i -th marginal distribution of each codeword U m as ( U m ) i for i ∈ [1 : n ] , and the i -marginaldistribution of b Q n C as ( b Q n C ) i , which takes the form ( b Q n C ) i ( z ) = 1 |M| X m ∈M X x ∈X ( U m ) i ( x ) W Z | X ( z | x ) , ∀ z ∈ Z . Let ¯ Q C ( z ) , n P ni =1 ( b Q n C ) i ( z ) . By taking the covertness constraint into account and following the analysis in [15, Eqn. (13)],we have δ ≥ D (cid:16) b Q n C k Q ⊗ n (cid:17) ≥ n D (cid:0) ¯ Q C k Q (cid:1) , (34)and thus lim n →∞ D (cid:0) ¯ Q C k Q (cid:1) = 0 . By applying Pinsker’s inequality V (cid:0) ¯ Q C , Q (cid:1) ≤ q D (cid:0) ¯ Q C k Q (cid:1) / , we also have lim n →∞ V (cid:0) ¯ Q C , Q (cid:1) = 0 . Let ψ = ψ n , n P ni =1 1 |M| P m ∈M ( U m ) i (1) be the fraction of ’s in the codebook, and one can express ¯ Q C ( z ) as ¯ Q C ( z ) = 1 n n X i =1 |M| X m ∈M X x ∈X ( U m ) i ( x ) W Z | X ( z | x )= n n X i =1 |M| X m ∈M ( U m ) i (1) ! Q ( z ) + n n X i =1 |M| X m ∈M ( U m ) i (0) ! Q ( z )= ψQ ( z ) + (1 − ψ ) Q ( z ) . Note that the requirement on variational distance lim n →∞ V (cid:0) ¯ Q C , Q (cid:1) = 0 implies that lim n →∞ ψ = 0 . Furthermore, we knowfrom [14, Eqn. (11)] that D (cid:0) ¯ Q C k Q (cid:1) ≥ ψ χ ( Q k Q ) − O ( ψ ) . (35)Combining (34) and (35), one can bound ψ from above as ψ ≤ s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) , (36) for some constant c > . At the same time, one also can interpret ψ as the average fractional Hamming weight of the code,since ψ = 1 n n X i =1 |M| X m ∈M ( U m ) i (1)= 1 |M| X m ∈M n n X i =1 X x i ( U m ) i ( x i ) { x i = 1 } = 1 |M| X m ∈M n n X i =1 X x i X x ( − i ) U m ( x i , x ( − i ) ) ! { x i = 1 } = 1 |M| X m ∈M n n X i =1 X x U m ( x ) { x i = 1 } = 1 |M| X m ∈M n X x U m ( x )wt H ( x )= 1 |M| X m ∈M f H ( m ) , (37)where x ( − i ) = ( x , . . . , x i − , x i +1 , . . . , x n ) ∈ X n − . This completes the proof of Lemma 8.For notational convenience, let k , s δχ ( Q k Q ) (cid:18) √ n + c n (cid:19) . Lemma 9 (Expurgation Lemma) . Suppose there exists a sequence of identification codes C with message set M , codewords { U m } m ∈M , and decoding regions {D m } m ∈M such that D ( b Q n C k Q ⊗ n ) ≤ δ , P (1)err = λ (1) n , P (2)err = λ (2) n , and P (3)err = λ (3) n , where lim n →∞ λ (1) n = lim n →∞ λ (2) n = lim n →∞ λ (3) n = 0 .Then, there exist a sequence κ n > (which depends on λ (1) n , λ (2) n ) which satisfies lim n →∞ κ n = 0 and a sequence ofidentification codes C ′ with message set M ′ , codewords { U ′ m } m ∈M ′ , and decoding regions {D ′ m } m ∈M ′ such that1) |M ′ | ≥ |M| / ( n + 1) ;2) For every m ∈ M ′ , U ′ m ( x ) = 0 for all x such that wt H ( x ) > (1 + κ n ) kn ;3) P (1)err ≤ ( λ (1) n ) / , P (2)err ≤ ( λ (2) n ) / , and P (3)err ≤ λ (3) n .Proof of Lemma 9. Since the identification code C satisfies D ( b Q n C k Q ⊗ n ) ≤ δ , Lemma 8 above ensures that its average fractionalHamming weight |M| P m ∈M f H ( m ) ≤ k . We define G as the subset of messages with small fractional Hamming weight, i.e., G , (cid:26) m ∈ M : f H ( m ) ≤ (cid:18) n (cid:19) k (cid:27) . (38)From (36) and (37), we have k ≥ ψ = 1 |M| X m ∈G f H ( m ) + 1 |M| X m ∈M\G f H ( m ) ≥ |M \ G||M| (cid:18) n (cid:19) k, which further implies that |G| ≥ |M| / ( n + 1) , i.e., the number of messages with small fractional Hamming weight is notsmall. Let λ n , max { λ (1) n , λ (2) n } and ǫ n , √ λ n −√ λ n . We partition X n into two disjoint sets—the low-weight set X n l , { x ∈X n : wt H ( x ) ≤ (1 + ǫ n ) (cid:0) n (cid:1) kn } and the high-weight set X n h , X n \ X n l . In the following, we describe the procedure ofconstructing the new code C ′ .1) First, the message set of the new code is M ′ = G . Thus, f H ( m ) ≤ (cid:0) n (cid:1) k for all m ∈ M ′ .2) For each m ∈ M ′ , we define g m , P x ∈X n l U m ( x ) , and we set the codeword U ′ m of the new code C ′ to be U ′ m ( x ) = ( U m ( x ) /g m , if x ∈ X n l , , otherwise . One can check that P x U ′ m ( x ) = 1 .3) The decoding regions of the new code C ′ are the same as those of C , i.e., D ′ m = D m for all m ∈ M ′ . From (38) we have that for each m ∈ M ′ , (cid:18) n (cid:19) k ≥ f H ( m ) = 1 n X x U m ( x ) wt H ( x ) ≥ n X x ∈X n h U m ( x ) wt H ( x ) ≥ n X x ∈X n h U m ( x ) · (1 + ǫ n ) (cid:18) n (cid:19) kn, which yields a lower bound on g m , i.e., g m = X x ∈X n l U m ( x ) = 1 − X x ∈X n h U m ( x ) ≥ ǫ n ǫ n . (39)We now analyze the error probabilities of the new code C ′ which only consists of low-weight sequences. For each m ∈ M ′ ,the error probability of the first kind P (1)err ( m ) can be bounded from above as P (1)err ( m ) = X x ∈X n l U ′ m ( x ) W ⊗ nY | X ( D m | x )= X x ∈X n l U m ( x ) g m W ⊗ nY | X ( D m | x ) ≤ g m X x U m ( x ) W ⊗ nY | X ( D m | x ) ≤ (cid:18) ǫ n ǫ n (cid:19) λ (1) n (40) ≤ (cid:16) λ (1) n (cid:17) / , (41)where (40) follows from (39) the the fact that the original code C satisfies P x U m ( x ) W ⊗ nY | X ( D m | x ) ≤ λ (1) n , and (41) is dueto the choice of ǫ n . Furthermore, for each message pair ( m, m ′ ) ∈ M ′ × M ′ such that m = m ′ , the error probability of thesecond kind P (2)err ( m, m ′ ) can be similarly bounded from above as P (2)err ( m, m ′ ) = X x ∈X n l U ′ m ′ ( x ) W ⊗ nY | X ( D cm | x )= X x ∈X n l U m ′ ( x ) g m ′ W ⊗ nY | X ( D cm | x )= 1 g m ′ X x U m ′ ( x ) W ⊗ nY | X ( D cm | x ) ≤ (cid:18) ǫ n ǫ n (cid:19) λ (2) n ≤ (cid:16) λ (2) n (cid:17) / . Finally, we note that the error probability of the third kind P (3)err ( m ) = P ⊗ n ( D ′ m ) is still bounded from above by λ (3) n ,since the decoding regions are unchanged, i.e., D ′ m = D m for m ∈ M ′ . We complete the proof of Lemma 9 by setting κ n = (1 + ǫ n ) (cid:0) n (cid:1) − , which vanishes as n tends to infinity.Proving the converse of identification problems usually relies on the achievability results for the channel resolvabilityproblem. In the following, we first introduce the definition of the K -type distributions , and then state a modified version ofthe channel resolvability result in Lemma 10. Lemma 10 is modified from the so-called soft-covering lemma presented byCuff [33, Corollary VII.2]. Definition 6.
For any positive integer K , a probability distribution P ∈ P ( X ) is said to be a K -type distribution if P ( x ) ∈ (cid:26) , K , K , . . . , (cid:27) , ∀ x ∈ X . Lemma 10.
Let P X ∈ P ( X n ) and P Y ( y ) = P x P X ( x ) W ⊗ nY | X ( y | x ) . We randomly sample K i.i.d. sequences x , . . . , x K according to P X . Let e P X ( x ) = 1 K K X i =1 { x = x i } , ∀ x ∈ X n be a K -type distribution and e P Y ( y ) = P x e P X ( x ) W ⊗ nY | X ( y | x ) be the corresponding output distribution. Then, for any ζ > and any P ′ Y ∈ P ( Y n ) , E (cid:16) V (cid:16) P Y , e P Y (cid:17)(cid:17) ≤ P P X W ⊗ nY | X log W ⊗ nY | X ( Y | X ) P ′ Y ( Y ) > ζ ! + 12 r e ζ K , where the expectation on the left-hand-side of the above inequality is over the random generation of x , . . . , x K .Proof. The proof is presented in Appendix B, and is adapted from [33, Section VII-C] with proper modifications.Note that Lemma 10 above holds for any P ′ Y ∈ P ( Y n ) , which differs from an analogous (but more restrictive) result in [33,Corollary VII.2] wherein P ′ Y is set to be P Y . This flexibility of choosing P ′ Y arbitrarily is important for proving the conversebecause we need to set it to P ⊗ n later for the analysis of the covert identification problem. We now consider the identificationcode C ′ constructed in Lemma 9. Lemma 11.
Let K , ⌈ (1 + n − / ) (1 + κ n ) kn D ( P k P ) ⌉ . For every message m ∈ M ′ with codeword U ′ m , there exists a K -type distribution e U m such that V (cid:16) U ′ m W ⊗ nY | X , e U m W ⊗ nY | X (cid:17) ≤ exp (cid:0) − c n / (cid:1) for some constant c > , where U ′ m W ⊗ nY | X and e U m W ⊗ nY | X respectively denote the distributions on Y n induced by U ′ m and e U m through the channel W ⊗ nY | X .Proof of Lemma 11. Consider a specific m ∈ M ′ with codeword U ′ m . Substituting P X with U ′ m , P ′ Y with P ⊗ n , and setting ζ , (1 + n − / )(1 + κ n ) kn D ( P k P ) in Lemma 10, we have P U ′ m W ⊗ nZ | X log W ⊗ nY | X ( Y | X ) P ⊗ n ( Y ) > ζ ! = X x X y U ′ m ( x ) W ⊗ nY | X ( y | x ) log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > ζ ! = (1+ κ n ) kn X q =0 X x : wt H ( x )= q U ′ m ( x ) X y W ⊗ nY | X ( y | x ) × log W ⊗ nY | X ( y | x ) P ⊗ n ( y ) > ζ ! (42) = (1+ κ n ) kn X q =0 X x : wt H ( x )= q U ′ m ( x ) P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) > ζ ! , (43)where in (42) we partition x into different type classes characterized by their Hamming weights, and (43) is obtained byassuming x i = 1 for i ∈ [1 : q ] and x i = 0 for i ∈ [ q + 1 : n ] without loss of generality. Also note that ζ − q D ( P k P ) ≥ ζ − (1 + κ n ) kn D ( P k P ) = n − / (1 + κ n ) kn D ( P k P ) , Υ . (44)Thus, we have P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) > ζ ! ≤ P P ⊗ q q X i =1 log P ( Y i ) P ( Y i ) − q D ( P k P ) > Υ ! (45) ≤ exp (cid:0) − c n / (cid:1) , (46)where (45) is obtained by subtracting q D ( P k P ) from both sides and by the inequality in (44), and (46) holds for someconstant c > and is obtained by applying Hoeffding’s inequality. Hence, the term in (43) is bounded from above by exp (cid:0) − c n / (cid:1) . Furthermore, one can also show that p e ζ /K ≤ exp( − n / ζ ) .Therefore, by Lemma 10, for every message m ∈ M ′ , there exists a K -type distribution e U m such that V (cid:16) U ′ m W ⊗ nY | X , e U m W ⊗ nY | X (cid:17) ≤ exp (cid:0) − c n / (cid:1) + exp (cid:0) − n / ζ (cid:1) ≤ exp (cid:0) − c n / (cid:1) , for some constant c > and all n large enough. In the following, we apply standard channel identification converse techniques to the code C ′ . For any m, m ′ ∈ M ′ suchthat m = m ′ , we have V (cid:16) U ′ m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X (cid:17) ≥ U ′ m W ⊗ nY | X ( D m ) − U ′ m ′ W ⊗ nY | X ( D m ) ≥ − ( λ (1) n ) / − ( λ (2) n ) / , (47)where the last inequality is due to Lemma 9 which states that the error probabilities of C ′ satisfies P (1)err ≤ ( λ (1) n ) / and P (2)err ≤ ( λ (2) n ) / . Meanwhile, from Lemma 11 we know that there exists a set of K -type distributions { e U m } m ∈M ′ such that V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) ≤ exp (cid:16) − c n / (cid:17) , ∀ m ∈ M ′ . (48)Combining (47) and (48), we have the following claim. Lemma 12.
For sufficiently large n , the distributions in { e U m } m ∈M ′ are distinct, i.e., there does not exist ( m, m ′ ) with m = m ′ such that e U m = e U m ′ .Proof of Lemma 12. Suppose e U m = e U m ′ for some m = m ′ . By the triangle inequality, we have V ( U ′ m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X ) ≤ V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) + V ( e U m W ⊗ nY | X , U ′ m ′ W ⊗ nY | X )= V ( U ′ m W ⊗ nY | X , e U m W ⊗ nY | X ) + V ( e U m ′ W ⊗ nY | X , U ′ m ′ W ⊗ nY | X ) ≤ (cid:0) − c n / (cid:1) , which contradicts (47) for sufficiently large n .It is worth noting that the number of distinct K -type distributions on X n is at most |X | nK . Thus, combining Lemma 11and Lemma 12, we have |M ′ | ≤ |X | nK , and by taking iterated logarithms on both sides, we have log log |M ′ | ≤ log K + log n + log log |X | . Therefore, by recalling that |M ′ | ≥ |M| / ( n +1) , K = ⌈ (1+ n − / ) (1+ κ n ) kn D ( P k P ) ⌉ , and k = q δχ ( Q k Q ) (cid:16) √ n + c n (cid:17) ,we eventually obtain that lim sup n →∞ log log |M|√ n ≤ lim sup n →∞ log log |M ′ |√ n ≤ lim sup n →∞ (cid:18) log K √ n + log n √ n + log log |X |√ n (cid:19) = s δχ ( Q k Q ) D ( P k P ) = C δ . This completes the proof of the converse part. VI. C
ONCLUDING R EMARKS
This work investigates the covert identification problem, showing that an ID message of size exp(exp(Θ( √ n ))) can bereliably and covertly transmitted over n channel uses. We also characterize the covert identification capacity and show that itequals the covert capacity in the standard covert communication problem. The covert identification capacity can be achievedwithout any shared key.Finally, we put forth several directions that we believe are fertile avenues for future research. • Strictly speaking, the converse result established in Section V is commonly known as a weak converse because allthree error probabilities are allowed to vanish as n grows. One would then expect that a strong converse for the covertidentification problem can be shown. This can perhaps be achieved following the lead of [3] and [34, Chapter 6] for thestandard channel identification problem. The key limitation of our converse technique that prevents us from deriving thestrong converse is the use of Lemma 9 (Expurgation Lemma), wherein we expurgate many high-weight sequences suchthat the error probabilities of the expurgated code increase significantly. Thus, a promising way to circumvent this issuemight be developing a more general result for channel resolvability with stringent input constraints (i.e., extending theapplicability of Lemma 11), instead of applying the Expurgation Lemma. • Having established the (first-order) fundamental limits, it is then natural to derive the error exponent of the covertidentification problem. One may follow the lead of the error exponent analysis for the standard identification problemby Ahlswede and Dueck [2]. However, due to the stringent input constraints mandated by the covertness constraints, thisstrategy requires special care and new analytical techniques to obtain closed-form expressions. • In addition to the KL-divergence metric studied in this work, it is also worth considering alternative covertness metricssuch as the variational distance and the probability of missed detection [16]. A PPENDIX AP ROOF OF L EMMA Q ≪ Q , and without loss of generality, we assume there does not exist a symbol z such that Q ( z ) = Q ( z ) = 0 . Let Z ′ , { z ∈ Z : Q ( z ) = 0 , Q ( z ) > } be the subset of symbols that are impossible to be inducedby the input symbol X = 1 . Let I ( z ) , { j ∈ [1 : n ] : z j ∈ Z ′ } be the set of locations such that the corresponding elementsbelong to Z ′ . Note that if z satisfies P n,l Z ( z ) > , the cardinality of I ( z ) must satisfy |I ( z ) | ≤ n − l . For any z such that P n,l Z ( z ) > , one can always find an e x such that P n,l X ( e x ) > and I ( z ) ∩ supp ( e x ) = ∅ ; thus W ⊗ nZ | X ( z | e x ) = Y j : e x j =1 P ( z j ) Y j : e x j =0 P ( z j ) ≥ ( µ ) l ( µ ) n − l ≥ e µ n , (49)where µ = min z : Q ( z ) > Q ( z ) , µ = min z : Q ( z ) > Q ( z ) , and e µ = min { µ , µ } . Then, we have ( |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z ) = ( |M| N ) min z : P n,l Z ( z ) > X x P n,l X ( x ) W ⊗ nZ | X ( z | x ) ≥ ( |M| N ) w l min z : P n,l Z ( z ) > X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) (50) ≥ min z : P n,l Z ( z ) > X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) (51) ≥ e µ n . (52)where (51) holds since |M| = exp { e r √ n } and w l = exp { Θ( √ n log n ) } , and (52) is true since we know from (49) that forevery z such that P n,l Z ( z ) > , one can find an e x with P n,l X ( e x ) > to ensure X x : P n,l X ( x ) > W ⊗ nZ | X ( z | x ) ≥ W ⊗ nZ | X ( z | e x ) ≥ e µ n . Thus, we have log |M| N ) min z : P n,l Z ( z ) > P n,l Z ( z ) ≤ log (1 + e µ n ) ≤ log ((1 + e µ ) n ) = n log (1 + e µ ) . (53)A PPENDIX BP ROOF OF L EMMA ζ > , and we decompose e P Y into two sub-distributions e P (1) Y and e P (2) Y such that e P (1) Y ( y ) , K K X i =1 W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) > ζ ) , e P (2) Y ( y ) , K K X i =1 W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) ≤ ζ ) . By noting that P Y ( y ) = E ( e P Y ( y )) , where the expectation is over the random generation of { x , . . . , x K } , we have E (cid:16) V (cid:16) P Y , e P Y (cid:17)(cid:17) = 12 E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P Y ( y ) (cid:17) − e P Y ( y ) (cid:12)(cid:12)(cid:12)! ≤ E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P (1) Y ( y ) (cid:17) − e P (1) Y ( y ) (cid:12)(cid:12)(cid:12)! + 12 E X y (cid:12)(cid:12)(cid:12) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:12)(cid:12)(cid:12)! . (54)The first term of (54) is bounded from above by X y E (cid:16) e P (1) Y ( y ) (cid:17) = 1 K K X i =1 X y X x i P X ( x i ) W ⊗ nY | X ( y | x i ) ( log W ⊗ nY | X ( y | x i ) P ′ Y ( y ) > ζ ) = P P X W ⊗ nY | X log W ⊗ nY | X ( Y | X ) P ′ Y ( Y ) > ζ ! . By applying Jensen’s inequality, the second term of (54) is bounded from above by X y E r(cid:16) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:17) ! ≤ X y s E (cid:20)(cid:16) E (cid:16) e P (2) Y ( y ) (cid:17) − e P (2) Y ( y ) (cid:17) (cid:21) = 12 X y r Var (cid:16) e P (2) Y ( y ) (cid:17) , (55)and one can further show thatVar (cid:16) e P (2) Y ( y ) (cid:17) = 1 K K X i =1 Var W ⊗ nY | X ( y | X i ) ( log W ⊗ nY | X ( y | X i ) P ′ Y ( y ) ≤ ζ )! ≤ K K X i =1 E W ⊗ nY | X ( y | X i ) ( log W ⊗ nY | X ( y | X i ) P ′ Y ( y ) ≤ ζ )! ≤ K K X i =1 E (cid:16) W ⊗ nY | X ( y | X i ) e ζ P ′ Y ( y ) (cid:17) = e ζ K P ′ Y ( y ) P Y ( y ) . By using the arithmetic-geometric mean inequality, we see that (55) is further bounded from above as X y r Var (cid:16) e P (2) Y ( y ) (cid:17) ≤ X y r e ζ K P ′ Y ( y ) P Y ( y ) ≤ r e ζ K X y P ′ Y ( y ) + P Y ( y )2= 12 r e ζ K . R EFERENCES[1] C. E. Shannon, “A mathematical theory of communication,”
The Bell System Technical Journal , vol. 27, no. 3, pp. 379–423, 1948.[2] R. Ahlswede and G. Dueck, “Identification via channels,”
IEEE Trans. Inf. Theory , vol. 35, no. 1, pp. 15–29, 1989.[3] T. S. Han and S. Verdú, “New results in the theory of identification via channels,”
IEEE Trans. Inf. Theory , vol. 38, no. 1, pp. 14–25, 1992.[4] A. D. Wyner, “The wire-tap channel,”
Bell System Technical Journal , vol. 54, no. 8, pp. 1355–1387, 1975.[5] Y. Liang, H. V. Poor, and S. Shamai,
Information Theoretic Security . Now Publishers Inc, 2009.[6] M. Bloch and J. Barros,
Physical-Layer Security: From Information Theory to Security Engineering . Cambridge University Press, 2011.[7] R. Ahlswede and N. Cai, “Transmission, identification and common randomness capacities for wire-tape channels with secure feedback from the decoder,”in
General Theory of Information Transfer and Combinatorics . Springer, 2006, pp. 258–275.[8] H. Boche and C. Deppe, “Secure identification for wiretap channels; robustness, super-additivity and continuity,”
IEEE Trans. Inf. Forensics Secur. ,vol. 13, no. 7, pp. 1641–1655, 2018.[9] ——, “Secure identification under passive eavesdroppers and active jamming attacks,”
IEEE Trans. Inf. Forensics Secur. , vol. 14, no. 2, pp. 472–485,2018.[10] B. A. Bash, D. Goeckel, and D. Towsley, “Limits of reliable communication with low probability of detection on AWGN channels,”
IEEE J. Sel. AreasCommun. , vol. 31, no. 9, pp. 1921–1930, 2013.[11] P. H. Che, M. Bakshi, and S. Jaggi, “Reliable deniable communication: Hiding messages in noise,” in
Proc. IEEE Int. Symp. Inf. Theory , 2013, pp.2945–2949.[12] P. H. Che, M. Bakshi, C. Chan, and S. Jaggi, “Reliable deniable communication with channel uncertainty,” in
Proc. IEEE Inform. Th. Workshop , 2014,pp. 30–34.[13] ——, “Reliable, deniable and hidable communication,” in
Proc. Inform. Th. Applic. Workshop , 2014, pp. 1–10.[14] M. R. Bloch, “Covert communication over noisy channels: A resolvability perspective,”
IEEE Trans. Inf. Theory , vol. 62, no. 5, pp. 2334–2354, 2016.[15] L. Wang, G. W. Wornell, and L. Zheng, “Fundamental limits of communication with low probability of detection,”
IEEE Trans. Inf. Theory , vol. 62,no. 6, pp. 3493–3503, 2016.[16] M. Tahmasbi and M. R. Bloch, “First-and second-order asymptotics in covert communication,”
IEEE Trans. Inf. Theory , vol. 65, no. 4, pp. 2190–2212,2018.[17] M. Tahmasbi, M. R. Bloch, and V. Y. F. Tan, “Error exponent for covert communications over discrete memoryless channels,” in
Proc. IEEE Inform.Th. Workshop , 2017, pp. 304–308.[18] K. S. K. Arumugam and M. R. Bloch, “Covert communication over a K-user multiple-access channel,”
IEEE Trans. Inf. Theory , vol. 65, no. 11, pp.7020–7044, 2019.[19] ——, “Embedding covert information in broadcast communications,”
IEEE Trans. Inf. Forensics Secur. , vol. 14, no. 10, pp. 2787–2801, 2019.[20] V. Y. F. Tan and S.-H. Lee, “Time-division is optimal for covert communication over some broadcast channels,”
IEEE Trans. Inf. Forensic Secur. , vol. 14,no. 5, pp. 1377–1389, 2018.[21] D. Kibloff, S. M. Perlaza, and L. Wang, “Embedding covert information on a given broadcast code,” in
Proc. IEEE Int. Symp. Inf. Theory , 2019, pp.2169–2173.[22] M. Ahmadipour, S. Salehkalaibar, M. H. Yassaee, and V. Y. F. Tan, “Covert communication over a compound discrete memoryless channel,” in
Proc.IEEE Int. Symp. Inf. Theory , 2019, pp. 982–986. [23] S.-H. Lee, L. Wang, A. Khisti, and G. W. Wornell, “Covert communication with channel-state information at the transmitter,” IEEE Trans. Inf. ForensicsSecur. , vol. 13, no. 9, pp. 2310–2319, 2018.[24] H. ZivariFard, M. Bloch, and A. Nosratinia, “Keyless covert communication in the presence of non-causal channel state information,” in
Proc. IEEEInform. Th. Workshop , 2019, pp. 1–5.[25] Q. Zhang, M. Bakshi, and S. Jaggi, “Covert communication over adversarially jammed channels,” in
Proc. IEEE Infom. Th. Workshop , 2018, pp. 1–5.[26] K. S. K. Arumugam, M. R. Bloch, and L. Wang, “Covert communication over a physically degraded relay channel with non-colluding wardens,” in
Proc. IEEE Int. Symp. Inf. Theory , 2018, pp. 766–770.[27] M. R. Bloch and S. Guha, “Optimal covert communications using pulse-position modulation,” in
Proc. IEEE Int. Symp. Inf. Theory , 2017, pp. 2825–2829.[28] Q. Zhang, M. R. Bloch, M. Bakshi, and S. Jaggi, “Undetectable radios: Covert communication under spectral mask constraints,” in
Proc. IEEE Int.Symp. Inf. Theory , 2019, pp. 992–996.[29] W. Hoeffding, “Probability inequalities for sums of bounded random variables,” in
The Collected Works of Wassily Hoeffding . Springer, 1994, pp.409–426.[30] J. Hou and G. Kramer, “Effective secrecy: Reliability, confusion and stealth,” in
Proc. IEEE Int. Symp. Inf. Theory , 2014, pp. 601–605.[31] T. S. Han and S. Verdú, “Approximation theory of output statistics,”
IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 752–772, 1993.[32] J. Hou and G. Kramer, “Informational divergence approximations to product distributions,” in
Proc. Canadian Workshop Inform. Theory , 2013, pp.76–81.[33] P. Cuff, “Distributed channel synthesis,”
IEEE Trans. Inf. Theory , vol. 59, no. 11, pp. 7071–7096, 2013.[34] T. S. Han,