[PDF] Covert MIMO Communications under Variational Distance Constraint

Abstract

The problem of covert communication over Multiple-Input Multiple-Output (MIMO) Additive White Gaussian Noise (AWGN) channels is investigated, in which a transmitter attempts to reliably communicate with a legitimate receiver while avoiding detection by a passive adversary. The covert capacity of the MIMO AWGN is characterized under a variational distance covertness constraint when the MIMO channel matrices are static and known. The characterization of the covert capacity is also extended to a class of channels in which the legitimate channel matrix is known but the adversary's channel matrix is only known up to a rank and a spectral norm constraint.

Full PDF

aa r X i v : . [ c s . I T ] F e b Covert MIMO Communicationsunder Variational Distance Constraint

Shi-Yuan Wang and Matthieu R. Bloch,

Senior Member, IEEE

Abstract

The problem of covert communication over Multiple-Input Multiple-Output (MIMO) Additive White Gaussian Noise (AWGN)channels is investigated, in which a transmitter attempts to reliably communicate with a legitimate receiver while avoiding detectionby a passive adversary. The covert capacity of the MIMO AWGN is characterized under a variational distance covertness constraintwhen the MIMO channel matrices are static and known. The characterization of the covert capacity is also extended to a class ofchannels in which the legitimate channel matrix is known but the adversary’s channel matrix is only known up to a rank and aspectral norm constraint.

I. I

NTRODUCTION

Covert communications, also known as communications with low probability of detection, have long been used to transmitsensitive information without raising suspicion. While technologies such as spread-spectrum communications have been widelydeployed, the information-theoretic limits of covert communication had not been investigated until recently. Much of the interesthas been spurred by the discovery of a square root law [1], which limits the scaling with the coding block length n of thenumber of reliable and covert communication bits over memoryless channels to O ( √ n ) . In other words, the standard capacityof covert communications is zero but the number of bits still grows with the block length. The optimal constant behind the O ( √ n ) scaling then plays the role of the covert capacity and has been characterized for many channels, including DiscreteMemoryless Channels (DMCs) and AWGN channels, using both relative entropy [2], [3] and variational distance [4], [5]as a covertness metric. Covert communications often require secret keys as an enabling resource, the rate of which can becharacterized [3]; in particular, no secret keys are required when the legitimate receiver obtains better observations than theadversary [6]. Recent advances include the characterization of the covert capacity in network information theory problems [7]–[9], quantum channels [10], [11], and low-complexity code constructions [12]–[16]. Particularly relevant to the present work,there have been attempts at studying MIMO-AWGN compound channels when measuring covertness using a relative entropymetric [17]–[19].Covertness must be measured in terms of a metric that captures how different the statistics of the observations are in presenceand in absence of communication. Relative entropy has been a popular choice [2], [3] because of its convenient analyticalproperties; however, variational distance is the metric that is operationally relevant to the performance of the adversary’sdetector [4]. In this work, we therefore use variational distance to measure covertness, which requires speciﬁc techniques,especially in the converse proof.The contributions of the present work are twofold. 1) We revisit the MIMO-AWGN channel model of [17]–[19] and, underthe assumption that the null space of the main and adversary’s channel matrices are trivial, we obtain a closed-form of thecovert capacity with variational distance as the covertness metric. Our approach extends the techniques developed in [4], [5] andthe crux of contribution is the converse proof. 2) We investigate the problem of covert communication over compound MIMO-AWGN channels, in particular the situation in which the adversary’s channel matrix is only known up to a rank constraint anda spectral norm constraint [17]–[20]. Our approach differs from the analysis in [17]–[20] and borrows ideas from [21] to avoidimplicit constraints on the adversary’s operation when dealing with uncountable compound channels. A preliminary version ofthese results was presented in [22] but without complete proofs. The present work offers self-contained and detailed proofs.II. C HANNEL M ODEL

A. Notation

Both log and exp should be understood in base e ; hence, all information-theoretic quantities are nats. Calligraphic letters areused for sets and | · | denotes their cardinality. ( · ) † denotes the Moore-Penrose inverse of a matrix. M (cid:23) denotes a positivesemi-deﬁnite matrix M . H ( · ) , h ( · ) , I ( · ; · ) , and h b ( · ) denote the usual entropy, differential entropy, mutual information, andbinary entropy function, respectively.For a continuous alphabet Ω and any two distributions P, Q with densities f P , f Q , respectively, the variational distancebetween P and Q is deﬁned as V ( P, Q ) , R Ω | f P ( x ) − f Q ( x ) | dx or equivalently V ( P, Q ) = sup S⊆ Ω | P ( S ) − Q ( S ) | . This work was supported by the National Science Foundation awards 1527387 and 1910859.The authors are with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 USA (e-mail: [email protected]; [email protected]).

The relative entropy between P and Q is deﬁned as D ( P k Q ) , R Ω f P ( x ) log f P ( x ) f Q ( x ) dx . Pinsker’s inequality ensures that V ( P, Q ) min ( D ( P k Q ) , D ( Q k P )) . Let X ∈ X and Y ∈ Y be jointly distributed random variables acccording to P · W , where P has density f P , and W : ( x, y ) W ( y | x ) is a transition probability from X → Y with density f W . Wedeﬁne the marginal distribution of Y as P ◦ W with density R X f W ( y | x ) f P ( x ) dx .Moreover, for two integers ⌊ a ⌋ and ⌈ b ⌉ such that ⌊ a ⌋ ⌈ b ⌉ , we deﬁne J a, b K , {⌊ a ⌋ , ⌊ a ⌋ + 1 , · · · , ⌈ b ⌉ − , ⌈ b ⌉} ; otherwise J a, b K , ∅ . For any x ∈ R , we also deﬁne the Q -function Q ( x ) , R ∞ x √ π e − x dx and its inverse function Q − ( · ) . B. System Model

We consider a MIMO-AWGN channel in which a transmitter (Alice) with N a antennas attempts to reliably communicatewith a legitimate receiver (Bob) with N b antennas in the presence of a passive adversary (the warden Willie) equipped with N w antennas. We assume that Bob and Willie possess more antennas than Alice, i.e., N a N b and N a N w . Bob andWillie’s received signals at every channel use are then y = H b x + n b and z = H w x + n w , (1)respectively, where x ∈ R N a is Alice’s transmitted signal and H b and H w are Bob’s and Willie’s channel matrices, assumedknown to everyone. We further assume that both matrices have full rank, i.e., m = rank ( H b ) = rank ( H w ) = N a . Hence, bothchannel matrices can be decomposed with a Generalized Singular Value Decomposition (GSVD) [23], [24] as H b = U ′ b Σ b Ω − Ψ ⊺ = U b Λ b V ⊺ , H w = U ′ w Σ w Ω − Ψ ⊺ = U w Λ w V ⊺ , (2)where Ψ ∈ R N a × N a , U ′ b ∈ R N b × N b , and U ′ w ∈ R N w × N w are orthogonal, Ω ∈ R m × m is lower triangular and nonsingular, and V ⊺ , Ω − Ψ ⊺ . Both Σ b ∈ R N b × m and Σ w ∈ R N w × m are diagonal with positive elements, { λ b,j } mj =1 and { λ w,j } mj =1 ,respectively. We truncate U ′ b and U ′ w into U b ∈ R N b × m and U w ∈ R N w × m , and deﬁne Λ b = diag (cid:0) { λ b,j } mj =1 (cid:1) and Λ w = diag (cid:0) { λ w,j } mj =1 (cid:1) . The noise vectors n b and n w are realizations of AWGN distributed according to N (cid:0) , σ b I N b (cid:1) and N (cid:0) , σ w I N w (cid:1) , respectively.Furthermore, for n ∈ N ∗ , we deﬁne the innocent symbol corresponding to the absence of communication as x = ; the outputdistributions induced by the innocent symbol at Bob and Willie are denoted P , N (cid:0) , σ b I N b (cid:1) and Q , N (cid:0) , σ w I N w (cid:1) ,respectively. The associated product distributions are denoted by P ⊗ n = Q ni =1 P and Q ⊗ n = Q ni =1 Q . Remark 1.

We assume that both H b and H w have a trivial null space equal to { } . If this were not the case, the presence ofa null space would fall in one of two scenarios. If H w has a non-trivial null space, Alice can overcome the square-root lawby steering her beam in the corresponding directions [17]–[19]. If H b has a non-trivial null space, Alice has no incentive touse the corresponding directions and would simply ignore them.C. Problem Formulation Alice transmits a uniformly-distributed message W ∈ J , M n K by encoding it into a codeword X n = [ X . . . X n ] ∈ R N a × n of blocklength n with the aid of a uniformly-distributed secret key S ∈ J , K n K shared with Bob. The resulting code is calledan ( n, M n , K n ) -code. Whether Alice communicates or not is controlled by φ ∈ { , } , with φ = 1 indicating the transmission.Once observing Y n = [ Y . . . Y n ] ∈ R N b × n , Bob uses his knowledge of the secret key to form a reliable estimate c W of W . Reliability is measured by the maximal average probability of error P ( n ) e , max s ¯ P ( n ) e ( s ) + P (cid:16) b φ = 1 | φ = 0 (cid:17) , (3)where ¯ P ( n ) e ( s ) , P (cid:16) W = c W | S = s, φ = 1 (cid:17) , and we deﬁne ¯ P ( n ) e , E S n P (cid:16) W = c W | S, φ = 1 (cid:17)o + P (cid:16) b φ = 1 | φ = 0 (cid:17) . Incontrast, Willie’s objective is to detect whether Alice is transmitting based on the observations Z n = [ Z . . . Z n ] ∈ R N w × n via a hypothesis test T ( Z n ) . In particular, Willie expects Q ⊗ n when there is no transmission between Alice and Bob (i.e., thenull hypothesis) and b Q n when the transmission occurs (i.e., the alternative hypothesis), where b Q n is the output distributioninduced by the code C used by Alice and Bob, ∀ z n ∈ R N w × n , b Q n ( z n ) = 1 M n K n M n X ℓ =1 K n X k =1 W ⊗ n Z | X (cid:16) z n | x ( ℓk ) n (cid:17) . (4)In the sequel, we use V ( b Q n , Q ⊗ n ) as our covertness metric. When testing the null hypothesis Q ⊗ n against the alternativehypothesis b Q n , any test T ( Z n ) conducted by Willie on the observations Z n satisﬁes > α + β > − V ( b Q n , Q ⊗ n ) , where α and β are the probabilities of false alarm and missed detection, respectively [25, Theorem 13.1.1]. In addition, the trade-off α + β = 1 is achieved with blind tests that do not use the observations. Consequently, making V ( b Q n , Q ⊗ n ) vanish amountsto rendering the adversary’s hypothesis test effectively blind and hence achieves covertness. Remark 2.

Our use of V ( b Q n , Q ⊗ n ) is motivated by the following considerations. There is no strong converse with respect to(w.r.t.) the value of V ( b Q n , Q ⊗ n ) , as can be seen from Deﬁnition 1 and Theorem 2 where the notion of throughput depends on thecovertness metric; hence, the choice of covertness metric matters. Many earlier works [2], [3], [17]–[19] measure covertnessusing the relative entropy D ( b Q n k Q ⊗ n ) . Unfortunately, relative entropy is only a loose proxy for variational distance sincePinsker’s inequality is not tight [4] and is then less directly related to the operational test of the adversary. Furthermore, both D ( b Q n k Q ⊗ n ) and D ( Q ⊗ n k b Q n ) could in principle be used but, depending on which metric is chosen, different conclusionsregarding the optimal signaling over AWGN channels are reached [26]. Deﬁnition 1.

A reliable and covert throughput r ∈ R + is achievable with corresponding key throughput k ∈ R + , if thereexists a sequence of ( n, M n , K n ) -codes with increasing blocklength n such that liminf n →∞ log M n √ nd n > r, limsup n →∞ log M n K n √ nd n r + k, (5) and lim n →∞ P ( n ) e = 0 , lim n →∞ V ( b Q n , Q ⊗ n ) = 0 , (6) where d n = Q − (cid:16) − V ( b Q n ,Q ⊗ n )2 (cid:17) . The covert capacity C covert is the supremum of achievable throughputs r . Note that, in our deﬁnition, we normalize the message and key sets size by √ nd n instead of the usual choice, n ; this isessential to unveil the square root law behind the covertness and is justiﬁed a posteriori by the results in Section III. Remark 3.

Our model does not include a power constraint on the channel input. This is justiﬁed since we only considerchannel matrices with trivial null space and since any power constraint on the input is weaker than the covertness constraint [2,Section V.]. The reason previous works [17]–[19] impose the power constraint is precisely because they allow non-trivial nullspaces.

III. M

AIN R ESULTS

Theorem 2.

The covert capacity of a MIMO-AWGN channel with full knowledge of the channel matrices is C covert = σ w σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) . (7) The covert capacity is achievable with sum of key throughput and capacity satisfying R key + C covert = vuut tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) . (8) A. Converse Proof for Variational Distance

Proposition 3.

Consider a sequence of covert MIMO-AWGN communication schemes for the model in (1) with increasingblocklength n ∈ N ∗ , characterized by ǫ n , P ( n ) e and δ n , V ( b Q n , Q ⊗ n ) . If lim n →∞ ǫ n = lim n →∞ δ n = 0 and lim n →∞ M n = ∞ ,then we have liminf n →∞ log M n √ nQ − (cid:0) − δ n (cid:1) σ w σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) . (9) Proof:

The proof extends the techniques developed in [1], [4], [5] by constructing a test for Willie that is simple enoughto be analyzed yet powerful enough to obtain a tight bound. We ﬁrst recall the Berry-Esseen Theorem [4, Theorem 6] in thefollowing theorem.

Theorem 4 (Berry-Esseen Theorem) . Let X , . . . , X n be independent random variables such that for k ∈ J , n K , we have E { X k } = µ k , σ k = Var ( X k ) , and t k = E (cid:8) | X k − µ k | (cid:9) . If we deﬁne σ = P nk =1 σ k and T = P nk =1 t k , then we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P n X k =1 ( X k − µ k ) > λσ ! − Q ( λ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Tσ . (10) a) Lower bound on covertness metric: We start by establishing a lower bound relating the covertness metric to theminimum received power of codewords at Bob within a given code C . Consider a simple hypothesis testing problem with twohypotheses H and H corresponding to distributions Q ⊗ n and b Q n , respectively. We deﬁne a sub-optimal power detector T ( z n ) , ( n X i =1 S i > τ ) , (11)where S i , S ( z i ) , k H b ( H w ) † z i k , and the threshold τ will be speciﬁed later. The intuition behind the test is to realignWillie’s observations with those of Bob. Note that, H ⊺ w H w is invertible because of the full-rank assumption. Hence, we rewritethe test S i using the GSVD as S i = (cid:16) H b ( H w ) † z i (cid:17) ⊺ (cid:16) H b ( H w ) † z i (cid:17) = ˆ z ⊺ i ˆ z i , (12)where ˆ z i = Λ b Λ − w U ⊺ w z i . The following lemma characterizes upper bounds for both the false-alarm and the missed-detectionprobabilities. Lemma 5.

Given a speciﬁc code C indexed by k , x ( k ) n = h x ( k )1 . . . x ( k ) n i ∈ C . By deﬁning P ∗ , min k (cid:13)(cid:13) H b x ( k ) n (cid:13)(cid:13) F =min k tr (cid:0) Λ b P ( k ) (cid:1) the minimum power of Bob’s received codewords, and setting the detection threshold to τ = P ∗ + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , α Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + B √ n , (13) β Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + P ∗ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn / tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w + B √ n , (14) where B and B are some constants independent of n .Proof: Under the null hypothesis H , for all i ∈ J , n K Z i ∼ N (cid:0) , σ w I N w (cid:1) , so that ˆ Z i ∼ N (cid:16) , σ w Λ b (cid:0) Λ − w (cid:1) (cid:17) ∈ R m .Hence, we have the following statistics: µ = n X i =1 E Q { S i } = nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (15) σ = n X i =1 Var ( S i ) = 2 nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (16) t ,i = E Q (cid:8) | S i − µ ,i | (cid:9) = O (1) , t = n X i =1 t ,i = O ( n ) , (17)where µ ,i , E Q { S i } . We use the Berry-Esseen Theorem to obtain an upper bound for the probability of false alarm asfollows: α = P H n X i =1 S i > τ ! (18) Q  τ − nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + 6 t σ (19) Q  τ − nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + B √ n . (20)The bound on the probability of false alarm (13) follows by applying the threshold τ = P ∗ + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) . Similarly, under the hypothesis H , we know that given a codeword x ( k ) n transmitted over the channel W ⊗ n Z | X , for every i ∈ J , n K Z i | X i = x ( k ) i ∼ N (cid:16) H w x ( k ) i , σ w I N w (cid:17) , so that ˆ Z i | ˜ X i = ˜ x ( k ) i ∼ N (cid:16) Λ b ˜ x ( k ) i , σ w Λ b (cid:0) Λ − w (cid:1) (cid:17) , where ˜ x ( k ) i = V ⊺ x ( k ) i .Let ˜ x ( k ) n , V ⊺ x ( k ) n and P ( k ) , P ni =1 ˜ x ( k ) i ˜ x ( k ) ⊺ i . Hence, we have the following statistics: µ ( k )1 = n X i =1 E b Q n S i | X i = x ( k ) i o = tr (cid:16) Λ b P ( k ) (cid:17) + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (21) σ ( k )21 = n X i =1 Var (cid:16) S i | X i = x ( k ) i (cid:17) = 4 σ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) P ( k ) (cid:17) + 2 nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (22) t ( k )1 ,i = E b Q n | S i − µ ( k )1 ,i | | X i = x ( k ) i o = O (1) , t ( k )1 = n X i =1 t ( k )1 ,i = O ( n ) , (23)where µ ( k )1 ,i , E b Q n S i | X i = x ( k ) i o . We use again the Berry-Esseen Theorem to obtain an upper bound for the probability ofmissed detection. For the k -th codeword, by deﬁning β ( k ) = P H (cid:0)P ni =1 S i < τ | X n = x ( k ) n (cid:1) , we have β ( k ) Q  tr (cid:0) Λ b P ( k ) (cid:1) + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) − τ r σ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) P ( k ) (cid:17) + 2 nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)  + 6 t ( k )1 σ ( k )31 (24) (a) Q  tr (cid:0) Λ b P ( k ) (cid:1) + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) − τ r σ w tr (cid:0) Λ b P ( k ) (cid:1) tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) + 2 nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)  + B √ n , (25)where (a) follows since Λ b , Λ w , P k (cid:23) , and since tr (cid:16) Λ b (cid:0) Λ − w (cid:1) P ( k ) (cid:17) tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) tr (cid:0) Λ b P ( k ) (cid:1) . By setting thedetection threshold to τ = P ∗ + nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , we further obtain β ( k ) (a) Q  P ∗ r P ∗ σ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) + 2 nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)  + B √ n (26) = Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w s P ∗ tr (cid:16) Λ b ( Λ − w ) (cid:17) nσ w tr (cid:16) Λ b ( Λ − w ) (cid:17)  + B √ n (27) (b) Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  − P ∗ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) nσ w tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)  + B √ n (28) (c) Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + P ∗ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn / tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w + B √ n , (29)where (a) follows since tr (cid:0) Λ b P ( k ) (cid:1) − P ∗ > tr ( Λ b P ( k ) ) > P ∗ and tr ( Λ b P ( k ) ) q a tr ( Λ b P ( k ) ) + b > P ∗ √ aP ∗ + b , ∀ a, b > , (b) follows from √ x > − x , ∀ x > , and (c) follows from Q ( x − y ) = Q ( x ) + R xx − y √ π exp (cid:16) − u (cid:17) du Q ( x ) + y √ π ∀ < y < x .Note that the upper bound (29) is independent of the codeword x ( k ) n , and hence we can use the bound on all the β ( k ) ’s.Therefore, we obtain β = P H n X i =1 S i < τ ! = 1 |C| X x ( k ) n ∈C β ( k ) Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  + P ∗ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn / tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w + B √ n . (30) Eventually, the covertness metric can be lower-bounded by δ n > − α − β > − Q  P ∗ r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  − P ∗ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn / tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w − B + B √ n , (31)which only depends on the code through the minimum power of Bob’s received codewords. b) Existence of a good sub-code: We develop a bound for the maximum power of a non-empty low-power sub-code inthe following lemma.

Lemma 6.

For any covert channel code C , given a decreasing sequence { γ n } ∞ n =1 with γ n ∈ (0 , , lim n →∞ γ n = 0 , there existsa subset of codewords C ( ℓ ) such that (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) > γ n |C| and k H b x n k F A √ n, where A , r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w Q −  − δ n − ν tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w − γ n  , (32) and ν depends on the channel.Proof: We partition the code C into two different sub-codes, a low-power sub-code C ( ℓ ) and a high-power sub-code C ( h ) ,where C ( ℓ ) , { x n ∈ C : k H b x n k F A √ n } , C ( h ) , C\C ( ℓ ) . The output distributions induced by these two sub-codes are b Q ( ℓ ) ( z n ) = 1 (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) X x n ∈C ( ℓ ) W ⊗ n Z | X ( z n | x n ) , and b Q ( h ) ( z n ) = 1 (cid:12)(cid:12) C ( h ) (cid:12)(cid:12) X x n ∈C ( h ) W ⊗ n Z | X ( z n | x n ) , (33)respectively. Note that b Q n = | C ( ℓ ) | |C| b Q ( ℓ ) + | C ( h ) | |C| b Q ( h ) . For a code C such that V ( b Q n , Q ⊗ n ) = δ n , we have δ n = V ( b Q n , Q ⊗ n ) (34) (a) > (cid:12)(cid:12) C ( h ) (cid:12)(cid:12) |C| V ( b Q ( h ) , Q ⊗ n ) − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| V ( b Q ( ℓ ) , Q ⊗ n ) (35) = V ( b Q ( h ) , Q ⊗ n ) − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| ( V ( b Q ( h ) , Q ⊗ n ) + V ( b Q ( ℓ ) , Q ⊗ n )) (36) (b) > V ( b Q ( h ) , Q ⊗ n ) − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| (37) (c) > − Q  A √ n r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w  − A n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn / tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w − B + B √ n − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| (38) = δ n + 2 ν tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w − B + B √ n − A tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w + 2 γ n − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| (39) (d) > δ n + 2 γ n − (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) |C| (40)where (a) follows from | C ( h ) | |C| V ( b Q ( h ) , Q ⊗ n ) V ( b Q n , Q ⊗ n ) + | C ( ℓ ) | |C| V ( b Q ( ℓ ) , Q ⊗ n ) , (b) follows since the variational distancebetween any two distributions is upper bounded by , (c) follows from (31), and (d) follows by choosing ν to satisfy ν tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w − B + B √ n − A tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) √ πn tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) / σ w > . (41)Hence, we can bound the cardinality of the low-power sub-code C ( ℓ ) from below as (cid:12)(cid:12) C ( ℓ ) (cid:12)(cid:12) > γ n |C| , which shows the existenceof such a low-power sub-code. c) Upper bound on covert message size within a good sub-code: The code C can be partitioned into K n sub-codes C s indexed by the key value s for all s ∈ J , K n K such that C = ∪ s ∈ J ,K n K C s , and the size of each sub-code is M n . Let C ( ℓ ) s , C s ∩ C ( ℓ ) . By the pigeonhole principle, there exists a sub-code C s satisfying (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) > γ n M n . Furthermore, since theaverage probability of error of C s is at most ǫ n , we have ¯ P ( n ) e (cid:16) C ( ℓ ) s (cid:17) ǫ n γ n , which vanishes in the limit of large n .Let f W denote the uniformly distributed variable over the messages in C ( ℓ ) s . By standard techniques, we therefore have log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) = H (cid:16)f W (cid:17) = I (cid:16)f W ; Y n S (cid:17) + H (cid:16)f W | Y n S (cid:17) (42) I (cid:16)f W S ; Y n (cid:17) + (cid:20) ǫ n γ n log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) + h b (cid:18) ǫ n γ n (cid:19)(cid:21) (43) I ( X n ; Y n ) + (cid:20) ǫ n γ n log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) + h b (cid:18) ǫ n γ n (cid:19)(cid:21) (44) n I (cid:0) ¯ X ; ¯ Y (cid:1) + ǫ n γ n log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) + 1 , (45)where Π ¯ X ( x ) = n P ni =1 Π X i ( x ) = n P ni =1 1 (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) P x n ∈C ( ℓ ) s { x = x i } and P ¯ X ¯ Y = Π ¯ X W Y | X . Let E (cid:8) ¯ X ¯ X ⊺ (cid:9) = Q n . Notethat E (cid:8) ¯ Y ¯ Y ⊺ (cid:9) = H b Q n H ⊺ b + σ b I N b . Then, I (cid:0) ¯ X ; ¯ Y (cid:1) = h (cid:0) ¯ Y (cid:1) − h (cid:0) ¯ Y | ¯ X (cid:1) (46)

12 log (cid:12)(cid:12)(cid:12)(cid:12) I N b + 1 σ b H b Q n H ⊺ b (cid:12)(cid:12)(cid:12)(cid:12) (47) = 12 tr (cid:18) log (cid:18) I N b + 1 σ b H b Q n H ⊺ b (cid:19)(cid:19) (48) (a) σ b tr ( H b Q n H ⊺ b ) (b) A σ b √ n , (49)where (a) follows since for any A < and k A k , tr ( log ( I + A )) = P i log (1 + λ i ( A )) P i λ i ( A ) = tr ( A ) , where { λ i ( A ) } i is the set of eigenvalues of A and we have used log (1 + x ) x for all x > , and (b) follows from the deﬁnitionof C ( ℓ ) and tr ( H b Q n H ⊺ b ) = n (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) P x n ∈C ( ℓ ) s k H b x n k F . Combining (32), (45), and (49), we have log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) r n tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) σ w σ b Q − (cid:0) − δ n (cid:1) + O (1)1 − ǫ n γ n . (50)We further choose the sequence { γ n } such that lim n →∞ ǫ n γ n = 0 and lim n →∞ − log γ n √ nQ − ( − δn ) = 0 . Then, the constraint thatlim n →∞ M n = ∞ combined with (50), lim n →∞ ǫ n γ n = 0 and (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) > γ n M n requires that lim n →∞ √ nQ − (cid:0) − δ n (cid:1) = ∞ . Finally, weobtain liminf n →∞ log M n √ nQ − (cid:0) − δ n (cid:1) liminf n →∞ log (cid:12)(cid:12)(cid:12) C ( ℓ ) s (cid:12)(cid:12)(cid:12) − log γ n √ nQ − (cid:0) − δ n (cid:1) (51) = σ w σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (52)where the inequality follows from lim n →∞ √ nQ − (cid:0) − δ n (cid:1) = ∞ and lim n →∞ − log γ n √ nQ − ( − δn ) = 0 .Unfortunately, we have not found a matching converse argument for the key throughput. B. Achievability Proof for Variational Distance

Proposition 7.

Consider a MIMO-AWGN covert communication channel in (1) . For an arbitrary ξ ∈ (0 , , there exists acovert communication scheme such that lim n →∞ log M n √ nQ − (cid:16) − V ( b Q n ,Q ⊗ n )2 (cid:17) = (1 − ξ ) σ w σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (53)lim n →∞ log M n K n √ nQ − (cid:16) − V ( b Q n ,Q ⊗ n )2 (cid:17) = (1 + ξ ) vuut tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (54)lim n →∞ P ( n ) e = 0 , lim n →∞ V ( b Q n , Q ⊗ n ) = 0 . (55) Proof:

Our proof follows [2], [3], [5] to construct a Binary Phase-Shift Keying (BPSK) code achieving the desiredthroughput pair. Note that we can also use a Gaussian codebook, but this requires extra care to deal with the power ofcodewords. a) Covert stochastic process [3]:

We introduce another input process Π Q n with covariance matrix Q n and its associateddistribution at the output of channel W Z | X , Q n , Π Q n ◦ W Z | X . Additionally, the associated product distributions are Π ⊗ n Q n = Q ni =1 Π Q n and Q ⊗ nn = Q ni =1 Q n . The achievability proof decomposes the covertness metric V ( b Q n , Q ⊗ n ) into two pieces, V ( b Q n , Q ⊗ nn ) and V ( Q ⊗ nn , Q ⊗ n ) by the triangle inequality. The former term is related to the channel output approximationproblem, and we rely on the channel resolvability to analyze its behavior. On the other hand, we will upper-bound the latter termby a covertness constraint δ n and require lim n →∞ δ n = 0 . Essentially, this constraint makes Q ⊗ nn asymptotically indistinguishablefrom the output distribution of the innocent symbol Q ⊗ n ; accordingly, Q ⊗ nn is called a covert stochastic process . The rationalefor introducing such a process is to ﬁnd a proxy to control the discrepancy captured by the covertness metrics by a carefuldesign covariance matrix Q n , which is the counterpart of low-weight codewords designed in the covert communication schemeover DMCs [3], [4]. b) Random code generation: We decompose the channel model into m parallel sub-channels deﬁned by the GSVDprecoding with the input alphabet e X , {− a n, , , a n, } × . . . × {− a n,m , , a n,m } , where m = rank ( H b ) = rank ( H w ) = N a .Throughout the section, tildes refer to the operations over the parallel sub-channels. Let M n , K n ∈ N ∗ . For each sub-channel j , Alice independently generates M n K n codewords ˜ x nj ( ℓ, k ) ∈ {− a n,j , a n,j } n with ℓ ∈ J , M n K and k ∈ J , K n K accordingto the distribution Π ρ n,j such that Π ρ n,j ( a n,j ) = Π ρ n,j ( − a n,j ) = , and Π ρ n,j (0) = 0 , where { ρ n,j } is a set of non-negativereal numbers deﬁned as ρ n,j , τ j Q − (cid:0) − δ n (cid:1) √ n = a n,j , ∀ j ∈ J , m K , (56) { τ j } mj =1 is determined later via an optimization program, and { δ n } ∞ n =1 is a sequence of positive real numbers such thatlim n →∞ √ nQ − (cid:0) − δ n (cid:1) = ∞ and lim n →∞ δ n = 0 . We deﬁne two diagonal matrices P n and T with { ρ n,j } mj =1 and { τ j } mj =1 as the diagonal entries, respectively. For simplicity, we stack these codewords into ˜ x n ( ℓ, k ) ∈ R m × n . Alice then employsthe precoding matrix ( V ⊺ ) − to form x n = ( V ⊺ ) − ˜ x n , and therefore the input covariance matrix after the precoding is Q n = ( V ⊺ ) − P n V − , where we actually design P n carefully as in (56). Bob and Willie postprocess their observations fromchannel outputs y n ∈ R N b × n and z n ∈ R N w × n by transforming them via U ⊺ b and U ⊺ w to get ˜ y n ∈ R m × n and ˜ z n ∈ R m × n ,respectively. There is no loss of generality in making this assumption for Willie, as the post-processing U ⊺ w performs anorthogonal transform and then discards the components in the observations corresponding to Null ( H ⊺ w ) , which only containnoise. These operations result in, ∀ i ∈ J , n K , ˜ y i = Λ b ˜ x i + ˜ n b,i , and ˜ z i = Λ w ˜ x i + ˜ n w,i , where ˜ n b ∼ N (cid:0) , σ b I m (cid:1) and ˜ n w ∼ N (cid:0) , σ w I m (cid:1) . From the perspective of the m sub-channels (cid:16) e X , W ˜ Y | ˜ X , e Y , W ˜ Z | ˜ X , e Z (cid:17) with e Y = R m , and e Z = R m , wetherefore deﬁne the following statistics: Π P n (˜ x ) = Q mj =1 Π ρ n,j (˜ x j ) , Π ⊗ n P n = Q ni =1 Π P n , e P n = Π P n ◦ W ˜ Y | ˜ X , e P ⊗ nn = Q ni =1 e P n , e Q n = Π P n ◦ W ˜ Z | ˜ X , and e Q ⊗ nn = Q ni =1 e Q n . Note that because of the parallelness of sub-channels, we can derive simple formsas follows: e P n = m Y j =1 (cid:18) N (cid:0) − λ b,j a n,j , σ b (cid:1) + 12 N (cid:0) λ b,j a n,j , σ b (cid:1)(cid:19) , (57) e Q n = m Y j =1 (cid:18) N (cid:0) − λ w,j a n,j , σ w (cid:1) + 12 N (cid:0) λ w,j a n,j , σ w (cid:1)(cid:19) . (58)Moreover, we can extend the output distribution induced by the innocent symbol to the parallel sub-channel model in the sameway. c) Channel reliability analysis: In the following analysis, we ﬁrst concentrate our derivation on a single sub-channel, andthen take the sum of the desired quantities over each sub-channel.

Lemma 8.

For each sub-channel j deﬁned as above, by choosing log M n,j = (1 − ξ ) λ b,j τ j √ nQ − (cid:0) − δ n (cid:1) σ b , (59) the average probability of error at sub-channel j satisﬁes E n ¯ P ( n ) e j o e − θ ,j √ nQ − ( − δn ) , (60) where ξ ∈ (0 , , and θ ,j > , ∀ j. Proof:

We specialize [3, Lemma 3] and [4, Lemma 1] into the following lemma.

Lemma 9.

For each sub-channel j deﬁned as above and γ j > , E n ¯ P ( n ) e j o M n,j e − γ j  E e P ⊗ nn,j  e P ⊗ nn,j (cid:16) ˜ Y nj (cid:17)e P ⊗ n ,j (cid:16) ˜ Y nj (cid:17)  + P W ⊗ n ˜ Y ( j ) | ˜ X ( j ) Π ⊗ nρn,j  log W ⊗ n ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y nj | ˜ X nj (cid:17)e P ⊗ n ,j (cid:16) ˜ Y nj (cid:17) γ j  , (61) where ¯ P ( n ) e j is the average probability of error. We ﬁrst analyze the last term on the right-hand side of (61) as follows: E e P n,j  e P n,j (cid:16) ˜ Y j (cid:17)e P ,j (cid:16) ˜ Y j (cid:17)  = cosh λ b,j ρ n,j σ b ! (a) exp λ b,j ρ n,j σ b ! , (62)where (a) follows from cosh( x ) e x . Therefore, we know that E e P ⊗ nn,j  e P ⊗ nn,j (cid:16) ˜ Y nj (cid:17)e P ⊗ nn,j (cid:16) ˜ Y nj (cid:17)  exp nλ b,j ρ n,j σ b ! = O (1) . (63)Next, we turn to analyze the ﬁrst term of (61). First note that log W ⊗ n ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y nj | ˜ X nj (cid:17)e P ⊗ n ,j (cid:16) ˜ Y nj (cid:17) = n X i =1 λ b,j ˜ X ij ˜ Y ij σ b − λ b,j ˜ X ij σ b ! . (64)Since ˜ Y ij | ˜ X ij = ˜ x ij ∼ N (cid:0) λ b,j ˜ x ij , σ b (cid:1) and ˜ x ij ∈ {− a n,j , a n,j } , we have log W ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y ij | ˜ X ij = ˜ x ij (cid:17)e P ,j (cid:16) ˜ Y ij (cid:17) ∼ N λ b,j ρ n,j σ b , λ b,j ρ n,j σ b ! . (65)Therefore, by setting γ j = (1 − ǫ ) nλ b,j ρ n,j σ b , where ǫ ∈ (0 , , and using the Hoeffding’s inequality, we have P W ⊗ n ˜ Y ( j ) | ˜ X ( j )=˜ xnj  n X i =1 log W ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y ij | ˜ x ij (cid:17)e P n (cid:16) ˜ Y ij (cid:17) (1 − ǫ ) nλ b,j ρ n,j σ b  exp − nǫ λ b,j ρ n,j σ b ! . (66)Then, we have P W ⊗ n ˜ Y ( j ) | ˜ X ( j ) Π ⊗ nρn,j  log W ⊗ n ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y nj | ˜ X nj (cid:17)e P ⊗ n ,j (cid:16) ˜ Y nj (cid:17) γ j  = X ˜ x nj ∈{− a n,j ,a n,j } n Π ⊗ nρ n,j (cid:0) ˜ x nj (cid:1) P W ⊗ n ˜ Y ( j ) | ˜ X ( j )=˜ xnj  n X i =1 log W ˜ Y ( j ) | ˜ X ( j ) (cid:16) ˜ Y ij | ˜ x ij (cid:17)e P n (cid:16) ˜ Y ij (cid:17) (1 − ǫ ) nλ b,j ρ n,j σ b  (67) exp − nǫ λ b,j ρ n,j σ b ! . (68)Eventually, by combining (61), (63), and (68), we have E n ¯ P ( n ) e j o exp − nǫ λ b,j ρ n,j σ b ! + M n,j e − γ j (1 + O (1)) . (69)Hence, by using (56), if we choose log M n,j = (1 − ω ) (1 − ǫ ) nλ b,j ρ n,j σ b = (1 − ξ ) λ b,j τ j √ nQ − (cid:0) − δ n (cid:1) σ b , (70)where ω ∈ (0 , and ξ = [(1 + ω )(1 + ǫ ) − (1 − ω )(1 − ǫ )] > , the result follows. d) Covertness analysis: Lemma 10.

By choosing T such that σ w tr (cid:0) Λ w T (cid:1) − O √ nQ − (cid:0) − δ n (cid:1) ! , (71) and log M n K n = (1 + ξ ) √ nQ − (cid:0) − δ n (cid:1) σ w tr (cid:0) Λ w T (cid:1) , (72) the expected covertness metric is bounded as follows: E {| V ( b Q n , Q ⊗ n ) − δ n |} e − θ √ nQ − ( − δn ) + ξ √ n , (73) where ξ ∈ (0 , , and θ , ξ > are some constants.Proof: In the following, we apply the triangle inequality to upper-bound the covertness metric, put a direct constraint on V ( Q ⊗ nn , Q ⊗ n ) , and show the remaining term vanishes exponentially fast. First note that, by the triangle inequality, we have | V ( b Q n , Q ⊗ n ) − V ( Q ⊗ nn , Q ⊗ n ) | V ( b Q n , Q ⊗ nn ) . (74)We ﬁrst analyze the second term on the right-hand side of (74). By the basic property of the variational distance, we have V ( Q ⊗ nn , Q ⊗ n ) = P Q ⊗ nn (cid:0) Q ⊗ nn ( Z n ) > Q ⊗ n ( Z n ) (cid:1) − P Q ⊗ n (cid:0) Q ⊗ nn ( Z n ) > Q ⊗ n ( Z n ) (cid:1) (75) = P Q ⊗ nn n X i =1 log Q n ( Z i ) Q ( Z i ) > ! − P Q ⊗ n n X i =1 log Q n ( Z i ) Q ( Z i ) > ! (76) (a) = P e Q ⊗ nn  n X i =1 m X j =1 log e Q n,j (cid:16) ˜ Z ij (cid:17)e Q ,j (cid:16) ˜ Z ij (cid:17) >  − P e Q ⊗ n  n X i =1 m X j =1 log e Q n,j (cid:16) ˜ Z ij (cid:17)e Q ,j (cid:16) ˜ Z ij (cid:17) >  , (77)where (a) follows since we apply the orthogonal transform U ′ ⊺ w to the per-channel-use observation Z i , and this mappingessentially is one-to-one and onto and does not reduce the variational distance (i.e., the equality of data-processing inequalityholds). Note that after the orthogonal transform, we could truncate the last N w − m components, since they contains purenoise (as they correspond to the null space of H ⊺ w ). Then, we can show that, for every i ∈ J , n K , µ j , E e Q n,j ( log e Q n,j ( Z i ) e Q ,j ( Z i ) ) (78) = E e Q n,j ( − λ w,j ρ n,j σ w + log (cid:18) cosh (cid:18) λ w,j a n,j Z i σ w (cid:19)(cid:19)) (79) = λ w,j ρ n,j σ w + O (cid:0) ρ n,j (cid:1) , (80) σ j , Var log e Q n,j ( Z i ) e Q ,j ( Z i ) ! (81) = E e Q n,j ( log e Q n,j ( Z i ) e Q ,j ( Z i ) ) − O (cid:0) ρ n,j (cid:1) (82) = λ w,j ρ n,j σ w + O (cid:0) ρ n,j (cid:1) , (83) t j , E e Q n,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log e Q n,j ( Z i ) e Q ,j ( Z i ) − µ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  = O (cid:0) ρ n,j (cid:1) . (84) Therefore, by the Berry-Esseen Theorem, we have P e Q ⊗ nn  n X i =1 m X j =1 log e Q n,j (cid:16) ˜ Z ij (cid:17)e Q ,j (cid:16) ˜ Z ij (cid:17) >  Q  − vuut n m X j =1 λ w,j ρ n,j σ w  + 6 n P mj =1 t j (cid:16) n P mj =1 σ j (cid:17) / (85) = 1 − Q vuut n m X j =1 λ w,j ρ n,j σ w  + O (cid:18) √ n (cid:19) . (86)Similarly, for every i ∈ J , n K , µ j , E e Q ,j ( log e Q n,j ( Z i ) e Q ,j ( Z i ) ) = − λ w,j ρ n,j σ w + O (cid:0) ρ n,j (cid:1) , (87) σ j , Var log e Q n,j ( Z i ) e Q ,j ( Z i ) ! = λ w,j ρ n,j σ w + O (cid:0) ρ n,j (cid:1) , (88) t j , E e Q ,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log e Q n,j ( Z i ) e Q ,j ( Z i ) − µ j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  = O (cid:0) ρ n,j (cid:1) . (89)We therefore have P e Q ⊗ n  n X i =1 m X j =1 log e Q n,j (cid:16) ˜ Z ij (cid:17)e Q ,j (cid:16) ˜ Z ij (cid:17) >  > Q vuut n m X j =1 λ w,j ρ n,j σ w  − n P mj =1 t j (cid:16) n P mj =1 σ j (cid:17) / (90) = Q vuut n m X j =1 λ w,j ρ n,j σ w  − O (cid:18) √ n (cid:19) . (91)Eventually, we ﬁnd an upper bound for (77) as follows: V ( Q ⊗ nn , Q ⊗ n ) − Q vuut n m X j =1 λ w,j ρ n,j σ w  + O (cid:18) √ n (cid:19) δ n . (92)Equivalently, we impose a covertness constraint σ w tr ( Λ w TΛ ⊺ w Λ w TΛ ⊺ w ) − O √ nQ − (cid:0) − δ n (cid:1) ! , (93)and the constraint (93) would be the main concern in the power design optimization. Similarly, − Q vuut n m X j =1 λ w,j ρ n,j σ w  − O (cid:18) √ n (cid:19) V ( Q ⊗ nn , Q ⊗ n ) δ n . (94)Note that, as we shall see later, the optimal design point of T actually occurs at the boundary of (93) since the Lagrangemultiplier is strictly positive, which implies − Q vuut n m X j =1 λ w,j ρ n,j σ w  + O (cid:18) √ n (cid:19) = δ n . (95)Additionally, since Q − ( α + x ) = Q − ( α ) − O ( x ) for any x close to zero, the assumption lim n →∞ √ nQ − (cid:0) − δ n (cid:1) = ∞ implieslim n →∞ √ nδ n = ∞ . By using (94), we then have, for n large enough, | V ( Q ⊗ nn , Q ⊗ n ) − δ n | ξ √ n , (96)where ξ > is some constant. Combining (74) with (96), we therefore have, for n large enough, | V ( b Q n , Q ⊗ n ) − δ n | | V ( b Q n , Q ⊗ n ) − V ( Q ⊗ nn , Q ⊗ n ) | + | V ( Q ⊗ nn , Q ⊗ n ) − δ n | V ( b Q n , Q ⊗ nn ) + ξ √ n . (97)For the term V ( b Q n , Q ⊗ nn ) in (97), our analysis follows from [3, Lemma 5], and we recall the following lemma. Lemma 11.

For any θ > , E { V ( b Q n , Q ⊗ nn ) } P Π ⊗ n P n W ⊗ n Z | X log W ⊗ n Z | X ( Z n | X n ) Q ⊗ n ( Z n ) > θ ! + 12 s e θ M n K n . (98)Similarly, we apply the orthogonal transform U ′ ⊺ w to the observations and decompose them into observations on each sub-channel; we obtain log W ⊗ n Z | X ( Z n | X n ) Q ⊗ n ( Z n ) = n X i =1 m X j =1 log W ˜ Z ( j ) | ˜ X ( j ) (cid:16) ˜ Z ij | ˜ X ij (cid:17)e Q ,j (cid:16) ˜ Z ij (cid:17) . (99)Since ˜ X ij ∈ {− a n,j , a n,j } , P mj =1 log W ˜ Z ( j ) | ˜ X ( j ) ( ˜ Z ij | ˜ X ij =˜ x ij ) e Q ,j ( ˜ Z ij ) ∼ N (cid:16)P mj =1 λ w,j ρ n,j σ w , P mj =1 λ w,j ρ n,j σ w (cid:17) . Therefore, by setting θ =(1 + ǫ ) n P mj =1 λ w,j ρ n,j σ w , and using the Hoeffding’s inequality, we have P Π ⊗ n P n W ⊗ n Z | X (cid:18) log W ⊗ n Z | X ( Z n | X n ) Q ⊗ n ( Z n ) > θ (cid:19) exp (cid:18) − ǫ n P mj =1 λ w ρ n,j σ w (cid:19) . Therefore, E { V ( b Q n , Q ⊗ nn ) } exp (cid:18) − ǫ n P mj =1 λ w ρ n,j σ w (cid:19) + q e θ M n K n . Eventually, recalling (56), if we choose log M n K n = (1 + ω ) (1 + ǫ ) n m X j =1 λ w,j ρ n,j σ w = (1 + ξ ) m X j =1 λ w,j τ j √ nQ − (cid:0) − δ n (cid:1) σ w , (100)then E { V ( b Q n , Q ⊗ nn ) } e − θ √ nQ − ( − δn ) , (101)for some appropriate choice of θ > . The result follows by combining (97) and (101). e) Identiﬁcation of a speciﬁc code: Choosing ξ , log M n,j and log K n to satisfy Lemma 8 and Lemma 10, Markov’sinequality allows us to conclude that there exists at least one speciﬁc code C with n large enough and appropriate constant ξ ,j , ξ > such that ¯ P ( n ) e j e − ξ ,j √ nQ − ( − δn ) for all j and | V ( b Q n , Q ⊗ n ) − δ n | e − ξ √ nQ − ( − δn ) + ξ √ n . Therefore, bychoosing log M n = m X j =1 log M n,j = (1 − ξ ) √ nQ − (cid:0) − δ n (cid:1) σ b tr (cid:0) Λ b T (cid:1) (102)over all the sub-channels, ¯ P ( n ) e = 1 − m Y j =1 (cid:16) − ¯ P ( n ) e j (cid:17) m X j =1 e − ξ ,j √ nQ − ( − δn ) . (103)Although a code C with vanishing ¯ P ( n ) e does not necessary satisfy the reliability constraint (3), which requires P ( n ) e to vanishas n goes to inﬁnity, the following lemma from [5] gives us such a guarantee by merely rearranging the codewords in C . Lemma 12.

Suppose a code C contains K n sub-codes of size M n such that ¯ P ( n ) e ǫ n and | V ( b Q n , Q ⊗ n ) − δ n | e − ξ √ nQ − ( − δn )+ ξ √ n . Then, there exists a code C ′ containing K ′ n sub-codes of size M ′ n such that P ( n ) e ǫ ′ n and | V ( b Q n , Q ⊗ n ) − δ n | e − ξ √ nQ − ( − δn ) + ξ √ n . In particular, lim n →∞ ǫ n = lim n →∞ ǫ ′ n = 0 , lim n →∞ M ′ n M n = 1 , and lim n →∞ K ′ n K n = 1 . f) Constellation power design: Next, we design the optimal constellation points in the sense we obtain the largestachievable message set size satisfying the covertness constraint. We formalize our optimization program by combining (102)and (71) as follows: max T < σ b tr (cid:0) Λ b T (cid:1) , (104a)s.t. σ w tr (cid:0) Λ w T (cid:1) − O √ nQ − (cid:0) − δ n (cid:1) ! . (104b)To solve this, we regard the remaining O (cid:18) √ nQ − ( − δn ) (cid:19) as a perturbation. Consider the optimization max T < σ b tr (cid:0) Λ b T (cid:1) , (105a)s.t. σ w tr (cid:0) Λ w T (cid:1) . (105b) The optimal Lagrange multiplier µ and solution T to (105) are µ = σ w √ σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , and T = 2 √ σ w Λ b ( Λ − w ) r tr (cid:16) Λ b ( Λ − w ) (cid:17) , respectively. Let ρ and ρ ′ denote the optimal objective values of (105) and (104), respectively. By the sensitivity analysis [27,Ch. 8.5], we have ρ > ρ ′ > ρ − O √ nQ − (cid:0) − δ n (cid:1) ! , (106)which shows the perturbation is negligible as n goes to inﬁnity. Consequently,lim n →∞ log M n √ nQ − (cid:0) − δ n (cid:1) = (1 − ξ ) σ w σ b r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) , (107)lim n →∞ log M n K n √ nQ − (cid:0) − δ n (cid:1) = (1 + ξ ) √ tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17)r tr (cid:16) Λ b (cid:0) Λ − w (cid:1) (cid:17) . (108)By (73), we further can normalize the previous results and obtain (53) and (54).IV. C OVERT C OMMUNICATION WITH U NKNOWN W ARDEN C HANNEL S TATE

We now assume that only partial channel state information of Willie’s channel is available. Speciﬁcally, all parties know theexact H b , while Alice only knows that H w belongs to the following uncertainty set: S , { H w = U w Λ w V ⊺ : k Λ w k λ , m = rank ( H w ) = rank ( H b ) = N a } , (109)where U w is known to Willie and hence can be canceled by post-processing [20]. Thus, the set S contains all the channelsthat are fully aligned with the main channel and for which the singular-value matrix is less than or equal to Λ , λ I m .The channel realization is ﬁxed during the transmission period. This model corresponds to a quasi-static scenario where theadversary cannot be closer to the transmitter than a certain protection distance [20].For an ( n, M n , K n ) -code C designed for the compound channel induced by S , the covertness metric at Willie is sup H w ∈S V ( b Q n H w , Q ⊗ n ) ,where b Q n H w is the distribution when communication occurs over the channel realization H w , b Q n H w ( z n ) = 1 M n K n M n X ℓ =1 K n X k =1 W ⊗ n Z | X (cid:16) z n | x ( ℓk ) n (cid:17) , (110) ∀ z n ∈ R N w × n , and W Z | X = x ∼ N (cid:0) H w x , σ w I N w (cid:1) .Here we only present the achievability proof for the compound covert capacity under the variational distance. The converseproof follows the fact that the worst-case covert capacity within the uncertainty set S upper bounds the compound covertcapacity as in [20, Theorem 3]. Proposition 13.

Consider a compound MIMO-AWGN covert communication channel in (1) and the uncertainty set S in (109) containing all possible channel realizations of the warden. For an arbitrary ξ ∈ (0 , , there exists a covert communicationscheme such that lim n →∞ log M n √ nQ − (cid:18) − sup H w ∈S V ( b Q n H w ,Q ⊗ n )2 (cid:19) = (1 − ξ ) σ w σ b r tr (cid:16) Λ b (cid:0) Λ − (cid:1) (cid:17) , (111)lim n →∞ log M n K n √ nQ − (cid:18) − sup H w ∈S V ( b Q n H w ,Q ⊗ n )2 (cid:19) = (1 + ξ ) vuut tr (cid:16) Λ b (cid:0) Λ − (cid:1) (cid:17) tr (cid:16) Λ b (cid:0) Λ − (cid:1) (cid:17) , (112)lim n →∞ P e = 0 , lim n →∞ sup H w ∈S V ( b Q n H w , Q ⊗ n ) = 0 . (113) Proof:

We extends the proof in [20], using ideas from [21]. The idea of the proof in [20] is to extend the result of compoundsecrecy capacity for uncountably inﬁnite compound DMCs to continuous alphabets through a sequence of successively ﬁnerquantizers, which quantize the input and output alphabets at all parties. The compound secrecy rate derived from quantizedalphabets can be made arbitrarily close to the compound secrecy capacity with a sufﬁciently ﬁne quantizer. Unfortunately, thisprocess requires the adversary to obey the quantization rule and implicitly assumes that the adversary should cooperate withAlice and Bob. We propose a small correction that circumvents the issue by considering an adversary that directly operates onthe channel output without quantization, and directly analyzes the difference in terms of covertness induced by a code betweentwo close channel states. a) Discretization: Since the uncertainty set S described in (109) is uncountable, we ﬁrst discretize S to construct acountably ﬁnite uncertainty set S n with a suitable choice of discretization level and discretization points.Note that since the uncertainty set S is subject to the spectral norm constraint, which results in an m -dimensional hypercubewith length λ on each side, a natural way to discretize is to uniformly slice S into mn hypercubic regions with length ǫ n , λ − n on each side. The discretization points constructing the set S n are chosen as follows: S n , { H = UΛ J V ⊺ : Λ J = diag ( j ǫ n , . . . , j m ǫ n ) , J = ( j , . . . , j m ) , j ℓ ∈ J , n K , ∀ ℓ ∈ J , m K } , (114)where J is an index for the elements in S n , and since U is known to Willie, we henceforth omit its impact in the remaining.Each discretization point H J is associated with a neighborhood S J,n , { e H = U e ΛV ⊺ : Λ J < e Λ , k Λ J − e Λ k < ǫ n } , (115)which covers a portion of the original uncertainty set. By construction, ∪ H J ∈S n S J,n = S . As discussed in previous sections,without loss of generality, we directly investigate the parallel sub-channels described by Λ b and Λ w . b) Approximation: For any of the above neighborhoods, the covertness metric at any channel realization e H is close tothat measured at the corresponding discretization point H . Precisely, we show that the difference of covertness metric betweenthem vanishes fast with respect to the blocklength n in the following lemma. Lemma 14.

For any e H ∈ S J,n and its associated discretization point H ∈ S n , | V ( b Q n e H , Q ⊗ n ) − V ( b Q n H , Q ⊗ n ) | O (cid:16) n e − n log 2 (cid:17) . (116)Thus, the covertness metric of any point in S can be closely approximated by some discretization point in S n for a sufﬁcientlylarge n . Proof:

We start by using V ( b Q n H , Q ⊗ n ) to approximate V ( b Q n e H , Q ⊗ n ) . Let P n = Q − ( − δn ) √ n T , where { δ n } ∞ n =1 is asequence of positive real numbers such that lim n →∞ √ nQ − (cid:0) − δ n (cid:1) = ∞ and lim n →∞ δ n = 0 . Then, for a ﬁxed P n , by the triangleinequality, we have | V ( b Q n e H , Q ⊗ n ) − V ( b Q n H , Q ⊗ n ) | V ( b Q n e H , b Q n H ) . (117)Note that for a given code C and x ( ℓk ) n ∈ C , V ( b Q n e H , b Q n H ) P M n ℓ =1 P K n k =1 M n K n V ( f W ⊗ n Z | X ( Z n | x ( ℓk ) n ) , W ⊗ n Z | X ( Z n | x ( ℓk ) n )) (118) (a) = P M n ℓ =1 P K n k =1 M n K n V ( f W ⊗ n ˜ Z | ˜ X ( ˜ Z n | ˜ x ( ℓk ) n ) , W ⊗ n ˜ Z | ˜ X ( ˜ Z n | ˜ x ( ℓk ) n )) , (119)where (a) follows since the orthogonal transformation preserves the variational distance. To characterize the behavior of V ( f W ⊗ n ˜ Z | ˜ X (˜ Z n | ˜ x ( ℓk ) n ) , W ⊗ n ˜ Z | ˜ X (˜ Z n | ˜ x ( ℓk ) n )) , we follow the proof of [21, Lemma 5] and consider a speciﬁc codeword ˜ x n , V ( f W ⊗ n ˜ Z | ˜ X ( ˜ Z n | ˜ x n ) , W ⊗ n ˜ Z | ˜ X ( ˜ Z n | ˜ x n ))= Z ˜ z n (cid:12)(cid:12)(cid:12)f W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) − W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) (cid:12)(cid:12)(cid:12) d ˜ z n (120) = Z P ni =1 k ˜ z i − e Λ ˜ x i k > r n (cid:12)(cid:12)(cid:12)f W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) − W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) (cid:12)(cid:12)(cid:12) d ˜ z n + Z P ni =1 k ˜ z i − e Λ ˜ x i k r n ! + P W ⊗ n ˜ Z | ˜ X n X i =1 (cid:13)(cid:13)(cid:13) ˜ z i − e Λ ˜ x i (cid:13)(cid:13)(cid:13) > r n ! + Z P ni =1 k ˜ z i − e Λ ˜ x i k r n ! exp (cid:18) − ( r n − nmσ w ) nm (cid:19) = exp ( −O ( n )) . (123)Also, by the triangle inequality for the Frobenius norm, we have (cid:13)(cid:13)(cid:13) ˜ z n − e Λ ˜ x n (cid:13)(cid:13)(cid:13) F (cid:13)(cid:13) ˜ z n − Λ ˜ x n (cid:13)(cid:13) F + (cid:13)(cid:13)(cid:13)(cid:16) Λ − e Λ (cid:17) ˜ x n (cid:13)(cid:13)(cid:13) F (124) (cid:13)(cid:13) ˜ z n − Λ ˜ x n (cid:13)(cid:13) F + vuut ǫ n n X i =1 k ˜ x i k (125) = (cid:13)(cid:13) ˜ z n − Λ ˜ x n (cid:13)(cid:13) F + ǫ n k ˜ x n k F . (126)Therefore, P ni =1 (cid:13)(cid:13)(cid:13) ˜ z i − e Λ ˜ x i (cid:13)(cid:13)(cid:13) > r n implies that P ni =1 (cid:13)(cid:13) ˜ z i − Λ ˜ x i (cid:13)(cid:13) > ( r n − ǫ n k ˜ x n k F ) = nm (1 + ǫ ) σ w . Accordingly, wecan use the concentration inequality for the sub-exponential random variables and obtain P W ⊗ n ˜ Z | ˜ X (cid:18)P ni =1 (cid:13)(cid:13)(cid:13) ˜ z i − e Λ ˜ x i (cid:13)(cid:13)(cid:13) > r n (cid:19) exp (cid:16) − nmǫ (cid:17) . We next investigate the variation between densities W ⊗ n ˜ Z | ˜ X and f W ⊗ n ˜ Z | ˜ X caused by the difference between H and e H as follows: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) f W ⊗ n ˜ Z | ˜ X (˜ z n | ˜ x n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 σ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 (cid:16) k ˜ z i − e Λ ˜ x i k − k ˜ z i − Λ ˜ x i k (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (127) (a) σ w (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 m X j =1 z ij − e λ j ˜ x ij )( ǫ n ˜ x ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 m X j =1 ( ǫ n ˜ x ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (128) (b) σ w vuut n X i =1 (cid:13)(cid:13)(cid:13) ˜ z i − e Λ ˜ x i (cid:13)(cid:13)(cid:13) ǫ n k ˜ x n k F + 12 σ w ǫ n k ˜ x n k F , (129)where (a) follows from the deﬁnition of S J,n and the triangle inequality, and (b) follows from the Cauchy-Schwartz inequality.Then, for the last term of (122), we proceed as follows: Z P ni =1 k ˜ z i − e Λ ˜ x i k Q r P mj =1 e λ j τ j σ w Q − (cid:0) − δ n (cid:1)! − Q r P mj =1 λ j τ j σ w Q − (cid:0) − δ n (cid:1)! > . To ensure (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n V ( Q ⊗ n H w , Q ⊗ n ) − δ n (cid:12)(cid:12)(cid:12)(cid:12) O (cid:18) √ n (cid:19) , (136)for large enough n , by using (92), (93), and (96), we have the optimization constraint σ w tr (cid:0) Λ T (cid:1) − O √ nQ − (cid:0) − δ n (cid:1) ! . (137)Also, by choosing log M n K n = (1 + ξ ) 12 σ w tr (cid:0) Λ T (cid:1) √ nQ − (cid:18) − δ n (cid:19) , (138)we ensure that max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } vanishes in exp (cid:0) −O (cid:0) √ nQ − (cid:0) − δ n (cid:1)(cid:1)(cid:1) , where it follows from (100) and(101).We next show that for any BPSK random code, max H w ∈S n V ( b Q n H w , Q ⊗ n H w ) can be upper-bounded by max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } . Lemma 15.

For any generated code, E , (cid:26) max H w ∈S n V ( b Q n H w , Q ⊗ n H w ) max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } + α n (cid:27) , (139) where α n = o (cid:16) √ n (cid:17) . Then, P ( E ) > − |S n | exp (cid:0) − M n K n α n (cid:1) . Proof:

We have P ( E ) (a) > − X H ∈S n P (cid:18) V ( b Q n H , Q ⊗ n H ) > max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } + α n (cid:19) (140) (b) > − X H ∈S n P ( V ( b Q n H , Q ⊗ n H ) > E C { V ( b Q n H , Q ⊗ n H ) } + α n ) (141) (c) > − |S n | exp (cid:0) − M n K n α n (cid:1) , (142)where (a) follows from the union bound, (b) follows since max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } > E C { V ( b Q n H , Q ⊗ n H ) } , and (c)follows from the McDiarmid’s Theorem [4, Lemma 2].Hence, we have P ( E ) → as n → ∞ since, with our choice in (138), exp ( − M n K n ) = exp (cid:0) − exp (cid:0) O (cid:0) √ nQ − (cid:0) − δ n (cid:1)(cid:1)(cid:1)(cid:1) and |S n | = exp ( O ( n )) . If n is large enough, with overwhelming probability, we can rewrite (135) as follows: (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n V ( b Q n H w , Q ⊗ n ) − max H w ∈S n V ( Q ⊗ n H w , Q ⊗ n ) (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n E C { V ( b Q n H w , Q ⊗ n H w ) } + α n (143) exp (cid:18) −O (cid:18) √ nQ − (cid:18) − δ n (cid:19)(cid:19)(cid:19) + α n = α n . (144)As a result, we show that for n large enough, there exists a random code C c generated according to the constraints (137) and(138), which is also a compound covert code for the whole discretized uncertainty set S n , i.e., (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n V ( b Q n H w , Q ⊗ n ) − δ n (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n V ( b Q n H w , Q ⊗ n ) − max H w ∈S n V ( Q ⊗ n H w , Q ⊗ n ) (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12) max H w ∈S n V ( Q ⊗ n H w , Q ⊗ n ) − δ n (cid:12)(cid:12)(cid:12)(cid:12) (145) O (cid:18) √ n (cid:19) + α n = O (cid:18) √ n (cid:19) (146)is ensured for sufﬁciently large n .Eventually, combining the above with the triangle inequality and (134), for n large enough, we can develop a bound similarto (73) as follows: (cid:12)(cid:12)(cid:12)(cid:12) sup H w ∈S V ( b Q n H w , Q ⊗ n ) − δ n (cid:12)(cid:12)(cid:12)(cid:12) ξ √ n , (147)where ξ > is some constant. Therefore, C c is also a compound covert code for the entire uncertainty set S . d) Constellation power design: The power design follows the same steps as in (104)-(108). To ﬁnd the optimal designpoint of T , we ﬁrst ignore the O (cid:18) √ nQ − ( − δn ) (cid:19) term in the (137) and include it later as a perturbation as in (106). Bysolving an optimization program similar to (105), we obtainlim n →∞ log M n √ nQ − (cid:0) − δ n (cid:1) = (1 − ξ ) σ w σ b r tr (cid:16) Λ b (cid:0) Λ − (cid:1) (cid:17) . (148)Therefore, by using (147), we can further normalize (148) and obtain (111). (112) follows similarly.R EFERENCES[1] B. A. Bash, D. Goeckel, and D. Towsley, “Limits of Reliable Communication with Low Probability of Detection on AWGN Channels,”

IEEE Journalon Selected Areas in Communications , vol. 31, no. 9, pp. 1921–1930, Sep. 2013.[2] L. Wang, G. W. Wornell, and L. Zheng, “Fundamental Limits of Communication With Low Probability of Detection,”

IEEE Transactions on InformationTheory , vol. 62, no. 6, pp. 3493–3503, Jun. 2016.[3] M. R. Bloch, “Covert Communication over Noisy Channels: A Resolvability Perspective,”

IEEE Transactions on Information Theory , vol. 62, no. 5, pp.2334–2354, May 2016.[4] M. Tahmasbi and M. R. Bloch, “First- and Second-Order Asymptotics in Covert Communication,”

IEEE Transactions on Information Theory , vol. 65,no. 4, pp. 2190–2212, Apr. 2019.[5] Q. E. Zhang, M. R. Bloch, M. Bakshi, and S. Jaggi, “Undetectable Radios: Covert Communication under Spectral Mask Constraints,” in

Proc. of IEEEInternational Symposium on Information Theory , Paris, France, Jul. 2019, pp. 992–996.[6] P. H. Che, M. Bakshi, and S. Jaggi, “Reliable deniable communication: Hiding messages in noise,” in

IEEE International Symposium on InformationTheory - Proceedings , Apr. 2013, pp. 2945–2949.[7] K. S. K. Arumugam and M. R. Bloch, “Covert Communication Over a K-User Multiple-Access Channel,”

IEEE Transactions on Information Theory ,vol. 65, no. 11, pp. 7020–7044, Nov. 2019.[8] ——, “Embedding Covert Information in Broadcast Communications,”

IEEE Transactions on Information Forensics and Security , vol. 14, no. 10, pp.2787–2801, Oct. 2019.[9] V. Y. F. Tan and S.-H. Lee, “Time-Division is Optimal for Covert Communication Over Some Broadcast Channels,”

IEEE Transactions on InformationForensics and Security , vol. 14, no. 5, pp. 1377–1389, May 2019.[10] B. A. Bash, A. H. Gheorghe, M. Patel, J. L. Habif, D. Goeckel, D. Towsley, and S. Guha, “Quantum-secure covert communication on bosonic channels,”

Nature Communications , vol. 6, no. 1, p. 8626, Dec. 2015.[11] L. Wang, “Optimal throughput for covert communication over a classical-quantum channel,” in . IEEE,Sep. 2016, pp. 364–368.[12] G. Fr`eche, M. Bloch, and M. Barret, “Polar Codes for Covert Communications over Asynchronous Discrete Memoryless Channels,”

Entropy , vol. 20,no. 1, p. 3, Dec. 2017.[13] I. A. Kadampot, M. Tahmasbi, and M. R. Bloch, “Multilevel-Coded Pulse-Position Modulation for Covert Communications,” in

Proc. of IEEE InternationalSymposium on Information Theory , Vail, CO, Jun. 2018, pp. 1864–1868.[14] ——, “Codes for Covert Communication over Additive White Gaussian Noise Channels,” in

Proc. of IEEE International Symposium on InformationTheory , Paris, France, Jul. 2019, pp. 977–981.[15] M. Lamarca and D. Matas, “A non-linear channel code for covert communications,” in

Proc. of IEEE Wireless Communications and NetworkingConference (WCNC) , Marrakesh, Morocco, Morocco, Apr. 2019, pp. 1–7.[16] Q. Zhang, M. Bakshi, and S. Jaggi, “Covert Communication With Polynomial Computational Complexity,”

IEEE Transactions on Information Theory ,vol. 66, no. 3, pp. 1354–1384, Mar. 2020.[17] A. Abdelaziz and C. E. Koksal, “Fundamental limits of covert communication over MIMO AWGN channel,” in

Proc. of IEEE Conference onCommunications and Network Security (CNS) , Las Vegas, NV, Oct. 2017, pp. 1–9.[18] A. Bendary, A. Abdelaziz, and C. E. Koksal, “Positive Covert Capacity of the MIMO AWGN Channels,” arXiv preprint , vol. 1910.13652, Oct. 2019.[19] A. Bendary and C. E. Koksal, “Order-Optimal Scaling of Covert Communication over MIMO AWGN Channels,” in

Proc. of 2020 IEEE Conference onCommunications and Network Security (CNS) , Avignon, France, France, Jun. 2020, pp. 1–9.[20] R. F. Schaefer and S. Loyka, “The Secrecy Capacity of Compound Gaussian MIMO Wiretap Channels,”

IEEE Transactions on Information Theory ,vol. 61, no. 10, pp. 5535–5552, Oct. 2015.[21] X. He and A. Yener, “MIMO Wiretap Channels With Unknown and Varying Eavesdropper Channel States,”

IEEE Transactions on Information Theory ,vol. 60, no. 11, pp. 6844–6869, Nov. 2014.[22] S.-Y. Wang and M. R. Bloch, “Covert MIMO Communications under Variational Distance Constraint,” in

Proc. of 2020 IEEE International Symposiumon Information Theory , Los Angeles, CA, Jun. 2020, pp. 828–833.[23] A. Khisti and G. W. Wornell, “Secure Transmission With Multiple Antennas—Part II: The MIMOME Wiretap Channel,”

IEEE Transactions on InformationTheory , vol. 56, no. 11, pp. 5515–5532, Nov. 2010.[24] C. C. Paige and M. A. Saunders, “Towards a Generalized Singular Value Decomposition,”

SIAM Journal on Numerical Analysis , vol. 18, no. 3, pp.398–405, 1981.[25] E. L. Lehmann and J. P. Romano,

Testing Statistical Hypotheses , ser. Springer Texts in Statistics. Springer New York, 2006.[26] S. Yan, Y. Cong, S. V. Hanly, and X. Zhou, “Gaussian Signalling for Covert Communications,”

IEEE Transactions on Wireless Communications , vol. 18,no. 7, pp. 3542–3553, Jul. 2019.[27] D. G. Luenberger,