[PDF] Physical Layer Security in Massive MIMO

Abstract

We consider a single-cell downlink massive MIMO communication in the presence of an adversary capable of jamming and eavesdropping simultaneously. We show that massive MIMO communication is naturally resilient to no training-phase jamming attack in which the adversary jams only the data communication and eavesdrops both the data communication and the training. Specifically, we show that the secure degrees of freedom (DoF) attained in the presence of such an attack is identical to the maximum DoF attained under no attack. Further, we evaluate the number of antennas that base station (BS) requires in order to establish information theoretic security without even a need for Wyner encoding. Next, we show that things are completely different once the adversary starts jamming the training phase. Specifically, we consider an attack, called training-phase jamming in which the adversary jams and eavesdrops both the training and the data communication. We show that under such an attack, the maximum secure DoF is equal to zero. Furthermore, the maximum achievable rates of users vanish even in the asymptotic regime in the number of BS antennas. To counter this attack, we develop a defense strategy in which we use a secret key to encrypt the pilot sequence assignments to hide them from the adversary, rather than encrypt the data. We show that, if the cardinality of the set of pilot signals are scaled appropriately, hiding the pilot signal assignments from the adversary enables the users to achieve secure DoF, identical to the maximum achievable DoF under no attack. Finally, we discuss how computational cryptography is a legitimate candidate to hide the pilot signal assignments. Indeed, while information theoretic security is not achieved with cryptography, the computational power necessary for the adversary to achieve a non-zero mutual information leakage rate goes to infinity.

Full PDF

PPhysical Layer Security in Massive MIMO

Y. Ozan Basciftci C. Emre Koksal Alexei Ashikhmin

Abstract —We consider a single-cell downlink massive MIMOcommunication in the presence of an adversary capable of jam-ming and eavesdropping simultaneously. We show that massiveMIMO communication is naturally resilient to no training-phasejamming attack in which the adversary jams only the data com-munication and eavesdrops both the data communication and thetraining. Speciﬁcally, we show that the secure degrees of freedom(

DoF ) attained in the presence of such an attack is identical to themaximum

DoF attained under no attack. Further, we evaluatethe number of antennas that base station (BS) requires in orderto establish information theoretic security without even a needfor Wyner encoding. Next, we show that things are completelydifferent once the adversary starts jamming the training phase.Speciﬁcally, we consider an attack, called training-phase jamming in which the adversary jams and eavesdrops both the trainingand the data communication. We show that under such an attack,the maximum secure

DoF is equal to zero. Furthermore, themaximum achievable rates of users vanish even in the asymptoticregime in the number of BS antennas. To counter this attack, wedevelop a defense strategy in which we use a secret key to encryptthe pilot sequence assignments to hide them from the adversary,rather than encrypt the data. We show that, if the cardinalityof the set of pilot signals are scaled appropriately, hiding thepilot signal assignments from the adversary enables the usersto achieve secure

DoF , identical to the maximum achievable

DoF under no attack. Finally, we discuss how computationalcryptography is a legitimate candidate to hide the pilot signalassignments. Indeed, while information theoretic security is notachieved with cryptography, the computational power necessaryfor the adversary to achieve a non-zero mutual informationleakage rate goes to inﬁnity.

I. I

NTRODUCTION

Massive MIMO is one of the highlights of the envisioned5G communication systems. In massive MIMO paradigm, thebase station is equipped with a number of antennas, typicallymuch larger than the number of users served. Combinedwith a TDD-based transmission, this solves many of theissues pertaining channel state information. In particular, thebase station exploits law-of-large-numbers-like certainties asit serves each user over a combination of a large number ofchannels.While many issues behind the design of multicellular mas-sive MIMO systems have been studied thoroughly, securityof massive MIMO has not been actively addressed. Part of

This work was presented in part in the IEEE Computer and NetworkSecurity Conference (CNS), Florence, Italy, Septermber, 2015Y. Ozan Basciftci and C. Emre Koksal are with the Department ofElectrical and Computer Engineering, The Ohio State University, Columbus,OH 43210, USA. Email: { basciftci.1, koksal.2 } .osu.edu. Alexei Ashikhminis with Bell Laboratories, Alacatel Lucent, Murray Hill, NJ, 07974, USA.Email: [email protected] publication was made possible by NPRP grant 5-559-2-227 from theQatar National Research Fund (a member of Qatar Foundation) and ONRgrant N00014-16-1-2253. the reason for this may be the fact that, there is a vastliterature on the security of MIMO systems in general, anda common perspective is that massive MIMO is merely anextension of MIMO as it pertains to security. However, wedemonstrate that massive MIMO has unique vulnerabilities,and standard approaches to MIMO security do not addressthem directly. Instead, these approaches focus on issues thatmassive MIMO is naturally immune to. Furthermore, we arguethat, common models used in MIMO security eliminate theneed to think on various components of the system that arecritical to understanding the vulnerabilities in security. Inparticular, in massive MIMO, merely making assumptions onavailable channel state information (CSI) is not sufﬁcient,since the actual technique the system uses to obtain CSI maybe the lead cause for some major security issues. For allthese reasons, security of massive MIMO calls for a separatetreatment of its own.To that end, we consider the TDD-based single cell down-link massive MIMO system developed in [3] and later read-dressed in [1]. The adversary is hybrid, capable of jammingand eavesdropping at the same time with its multiple antennasand we call our system secure if secrecy, measured in fullequivocation is achieved at the adversary and arbitrarily lowprobability of decoding error is achieved at the legitimate re-ceiver. We refer to these requirements as security constraints .We ﬁrst show how massive MIMO is naturally resilient tostandard jamming and eavesdropping attacks, unless jammingis performed during the training phase when pilot signals aretransmitted by the mobile users. We prove that, without pilotjamming, the achievable secure degrees of freedom ( DoF ) isidentical to the maximum

DoF attained under no attack, evenwithout the need to use a stochastic (e.g., Wyner) secrecyencoder in the massive MIMO limit. On the other hand, aswe will show, the adversary can reduce the maximum secure

DoF and rate to zero by contaminating the pilot signal of thetargeted user via another correlated pilot signal. To addressthis attack, we develop a defense strategy in which the basestation (BS) keeps the assignment of pilot signals to the usershidden from the adversary and informs the assignments tothe users reliably. Thus, in our approach, we use computa-tional cryptography for encrypting the pilot assignments inthe training phase. We also discuss how the consequences ofencryption of pilot assignment is fundamentally different fromthe consequences of data encryption. In particular, we argue Our deﬁnition of degrees of freedom is different from the standarddeﬁnition. Our deﬁnition speciﬁes how the achievable rate scales with the logof the number of base station antennas, rather than the log of the transmissionpower as in the standard deﬁnition. a r X i v : . [ c s . I T ] M a r hat, even if we use non-information theoretic methods (e.g.,Difﬁe-Hellman) to encrypt the pilot assignments, the level ofsecurity we achieve can be as strong as information theoreticsecrecy for all practical purposes. Note that, most of our resultsare not asymptotic in the number of antennas and we specifythe number of antennas necessary to achieve certain level ofsecurity.The major ideas developed and demonstrated in this paperinclude: • In information-theoretic secrecy literature, it is often thecase that assumptions are made on the CSI available atthe adversary. Typically, it is assumed that the adversaryhas access to the CSI for all channels in the system, withthe motivation of making the achievable security robustwith respect to the availability of CSI at the adversary.However, we show that, with massive MIMO, it is notimportant if the adversary has full CSI or not . Indeed,we show that massive MIMO is naturally immune toattacks during data communication phase. Instead, wedemonstrate that the major question is how the adversaryobtains CSI. In particular, we show that if the adversary isactive during the training phase, it substantially degradesthe security of data communication. • Security in computational cryptography is based on theassumptions on the computational power of the attackers.Once data is encrypted, it takes an unreasonable amountof time for a typical adversary to decrypt it without thekey. Making such an assumption on the adversary posesa problem for security, since a sophisticated adversarycan use various tools and techniques to cut down thetime for cryptanalysis applied to recorded encrypted data.We eliminate this shortcoming by encrypting the pilotassignments -not the transmitted data,- using keys thatare shared via standard Difﬁe-Hellman. In our scheme, tomake an impact, the adversary needs to decrypt the pilotassignment before the training phase starts. Note that, thetraining phase can start immediately after the assignmentsare made, leaving an arbitrarily low amount of time forthe adversary to crack the assignment (i.e., pushing thecomputational power necessary to inﬁnity). Without theknowledge of the pilot assignment, our scheme achievesperfect secrecy of information transmitted in the datacommunication phase, even without the use of a secrecyencoder. Thus, it is useless for the adversary to recordthe received signal for future cryptanalysis, since it isindifferent from noise.Next, we summarize the technical contributions of our paper.Throughout the paper, we assume that the adversary is full-duplex , i.e., it is capable of eavesdropping and jamming theBS-to-user communication simultaneously. In the ﬁrst part ofthe paper, we study an attack model in which the adversaryeavesdrops the entire communication between the BS andusers and jams only the downlink data communication (theadversary keeps silent during the training.). Under this attack: • We show that the maximum secure

DoF is identical to the maximum

DoF achieved in the presence of noadversary. • We provide a novel encoding strategy, δ -conjugate beam-forming, that provides full security, without the need forWyner encoding [10]. • We evaluate the number of antennas that the BS requiresin order to satisfy the security constraints.The proposed encoding, δ -conjugate beamforming, utilizes thefact that the correlation between the estimated BS-to-userchannel gains and the BS-to-adversary channel gains becomes zero when the adversary does not jam during the trainingphase. We observe that in order to cause a non-zero correlationbetween the estimated BS-to-user channel gains and the BS-to-adversary channel gains, the adversary has to jam the pilotsof users.In the second part of the paper, we consider an attackmodel in which the adversary eavesdrops and jams the entirecommunication (including the training) between the BS andthe users. Under this attack: • We show that, if the adversary jams the training suchthat there exists a non-zero correlation between the BS-to-adversary channel gain and the estimated gain of thechannel from the BS to a user, the adversary reduces themaximum secure

DoF to zero. Further, we show that,if the amount of the correlation is sufﬁciently large, themaximum achievable rate of the user also vanishes as thenumber of antennas at the BS grows. • We propose a counter strategy against the adversary. Weshow that, if the cardinality of the set of pilot signalsscales with the number of antennas at the BS and the BSis able to keep the pilot signal assignments hidden fromthe adversary, attained secure

DoF is arbitrarily close tothe maximum

DoF attained under no attack.

Related Work:

Massive MIMO concept was ﬁrst proposedin [3], [4]. Since then, there has been a ﬂurry of studiesfocusing on different aspects of massive MIMO (see sur-vey [5]) such as channel estimation, energy efﬁciency, andpilot contamination. However, while MIMO security has beenan active area of research [6], [7], [8], issues speciﬁc tomassive MIMO have not been considered. Among the veryfew, in [9], the authors consider downlink multi cell massiveMIMO system in the presence of an adversary that onlyeavesdrops. In order to confuse the adversary, the BS transmitsartiﬁcial noise from a set of its antennas. The authors concludethat, if the adversary has sufﬁciently large number of antennas,it is impossible to operate at a positive rate with artiﬁcial noisegeneration at the BS. In our earlier work [2], which sets upthe main results in this paper, we have focused on a fairlydifferent model and addressed other questions. For instance,our attack model considers both jamming and eavesdropping,possibly simultaneously by the adversary.II. S

YSTEM M ODEL AND P ROBLEM S TATEMENT

We consider a multi user MIMO downlink communicationsystem, depicted in Figure 1, including a base station (BS), ser 1 User K Adversary Base station ( 𝑀𝑀 antennas) ( 𝑀𝑀 𝑒𝑒 antennas) (Single antenna) (Single antenna) Fig. 1. System Model K single-antenna users, and an adversary. The BS equippedwith M antennas wishes to broadcast K distinct messages [ W , . . . , W K ] each of which is intended for a different user.The adversary is equipped with M e antennas. A. Channel Model

We assume all the channels in our system, illustratedin Figure 1, are block fading. In the block fading channelmodel, time is divided into discrete blocks each of whichcontains T channel uses. The channel gains remain constantwithin a block and the channel gains on different blocksare independent and identically distributed. Furthermore, weassume the channels are reciprocal; the instantaneous gain ofthe channel connecting the BS to a user is as same as the gainof the channel connecting to the same user to the BS.We follow a TDD-based two-phase transmission schemeintroduced in [11] and is re-adressed in [1]. The signaltransmission in a block is separated into two phases: trainingphase and data communication phase. On the ﬁrst T r channeluses of every block, each user sends a pilot signal to the BS.The BS estimates each BS-to-user channel from the observedpilot signals. On the last T d channel uses of each block( T d (cid:44) T − T r ), the BS transmits data to the users.The observed signals during a data communication phase at k -th user and at the adversary at a particular channel use of i -th block are as follows : Y k = H k ( i ) X + H jam,k ( i ) V jam + V k (1) Z = H e ( i ) X + V e , (2)where Y k is a received complex signal at k -th user, Z isa received M e × complex vector at the adversary, and X denotes M × complex vector of transmitted data symbols.Signals V k and V e are additive Gaussian noise components,distributed as CN (0 , and CN ( , I M e ) , respectively. Signal V jam denotes M e × complex vector of jamming signal.Further, H k ( i ) and H jam,k ( i ) denote a × M complexgain vector of the channel connecting the base station to k -th user, a × M e complex gain vector of the channel Except for the channel gains, we avoid the block and channel use indicesin (1) and (2) and the block indicies in (3) and (4) for the sake of simplicity. connecting the adversary to k -th user, respectively, at i -th block. Similarly, H e ( i ) is the M e × M complex gainmatrix of the MIMO channel connecting the base stationto the adversary at i -th block. We assume that all channelgains H e ( i ) , H ( i ) , . . . H K ( i ) , H jam, ( i ) , . . . , H jam,K ( i ) aremutually independent for any i ≥ .The users send pilots in the ﬁrst T r channel uses of eachblock. The received signals at the BS and at the adversary inthe training phase of i -th block are as follows: Y T r = K (cid:88) k =1 H T k ( i ) φ k + H T e ( i ) W jam + W, (3) Z T r = K (cid:88) k =1 H T jam,k ( i ) φ k + W e , (4)where Y T r and Z T r denote M × T r and M e × T r complexmatrices of the received signals over T r channel uses at theBS and at the adversary, respectively. Signals W and W e are M × T r and M × T r complex matrices denoting theadditive Gaussian noise. Each element of W and W e are i.i.d CN (0 , . Signal V jam denotes M e × T r complex matrix ofjamming signal. Signal φ k is × T r complex vector denotingthe pilot signal associated with k -th user. The power of pilotsignals ρ r , i.e., T r tr ( φ ∗ k φ k ) = ρ r is identical for all users k ∈ { , . . . , K } .We assume that the users do not have the knowledge of theBS-to-user channel gains. Note that the BS, the users, and theadversary know pilot signal set [ φ , . . . , φ K ] . The adversary isassumed to be aware of which pilot signal is assigned to whichuser. Utilizing the pilot signals, the BS estimates the BS-to-user channel gains. Deﬁne ˆ H k ( i ) as × M complex vector ofestimated BS-to- k -th user channel gain. Further, for any B ≥ , deﬁne H B , ˆ H B , H Be , and H Bjam as the gains of the BS-to-user channels, the estimated gains of the BS-to-user channels,the gains of the BS-to-adversary channel, and the gains of theadversary-to-user channels over B blocks, respectively, i.e., H B (cid:44) (cid:2) H B , . . . , H BK (cid:3) , ˆ H B (cid:44) (cid:104) ˆ H B , . . . , ˆ H BK (cid:105) and H Bjam (cid:44) (cid:2) H Bjam, , . . . , H Bjam,K (cid:3) .For any B ≥ , the joint probability density function of (cid:16) H B , ˆ H B , H Be , H Bjam (cid:17) is p H B , ˆ H B ,H Be ,H Bjam (cid:16) h B , ˆ h B , h Be , h Bjam (cid:17) = B (cid:89) i =1 p H, ˆ H,H e ,H jam (cid:16) h ( i ) , ˆ h ( i ) , h e ( i ) , h jam ( i ) (cid:17) (5)where H (cid:44) [ H , . . . , H K ] , ˆ H (cid:44) (cid:104) H , . . . , ˜ H K (cid:105) , and H jam (cid:44) [ H jam, , . . . , H jam,K ] . For any k ∈ { , . . . , K } , H k and H jam,k are distributed as CN ( , I M ) , CN ( , I M e ) , respec-tively, and each element of matrix H e is i.i.d C N (0 , .The adversary has the perfect knowledge of the BS-to-userchannel gains H and the estimated BS-to-user channel gains ˆ H . Deﬁne H k m and ˆ H k m as the gain and the estimated gainof the channel connecting m -th BS antenna to k -th user. Weassume that for any k ∈ { , . . . , K } , { H k m ˆ H k m } m ≥ formsn i.i.d process. We also assume that ˆ H k are independent with H l and E (cid:104) ˆ H k ˆ H ∗ l (cid:105) = 0 for k (cid:54) = l and k, l ∈ { , . . . , K } .Note that we do not impose these assumptions for the BS-to-adversary channels. Remark 1.

When MMSE estimator and mutually orthogonalpilot signals are employed at the BS for channel estimation,these assumptions are satisﬁed.B. Attack Model

We consider a full duplex adversary that is capable ofeavesdropping and jamming simultaneously. In the sequel, weconsider two attack models that differ only in the adversary’sjamming activity in the training phase.In Section III, we consider an attack model in which theadversary jams only during the data communication phaseand eavesdrops both the training and the data communicationphases. We call this attack model as no training-phase jam-ming . In the no training-phase jamming, the adversary jamsduring the communication phase using a Gaussian jammingsignal and keeps silent during the training phase. Speciﬁcally,signal W jam in (4) is identical to zero and jamming signal V jam in (1) is distributed as CN ( , ρ jam I M e ) , where ρ jam isthe jamming power.In Sections IV and V, we consider an attack model inwhich the adversary jams and eavesdrops both the trainingand the data communication phases. We call this attack modelas training-phase jamming . The adversary strategy during thedata communication phase in this attack model is the same asthat described in the previous attack model (i.e., no training-phase jamming). Instead of jamming with random signals,the adversary jams during the training phase with structuredsignals. We provide a detailed description of the signals usedfor jamming the training phase in Section IV and V. C. Code Deﬁnition

The BS aims to send message w k ∈ W k , k = 1 , . . . , K ,to k -th user over B blocks with rate R k , while keeping w k secret from the adversary. The BS and the users employ code (cid:0) BT R , . . . , BT R K , BT d (cid:1) of length BT d , that contains: K message sets, W k (cid:44) { , . . . , BT R k } , k = 1 , . . . , K . K injective encoding functions, f k , k = 1 , . . . , K , where f k maps w k ∈ W k to data signal sequence s BT d k ∈ C BT d satisfying an average power constraint such that BT d B (cid:88) i =1 T (cid:88) j = T r +1 | s k ( i, j ) | ≤ ρ k , k = 1 , . . . , K (6)for all w k ∈ W K , where notation ( i, j ) indicates the j -thchannel use of i -th block, ρ k denotes the power constraintfor k -th user, and s k ( i, j ) is the complex data signal of k -th user. Note that ρ f (cid:44) (cid:80) Kk =1 ρ k is the cumulative averagetransmission power. Further, note that encoding functions, f k , k = 1 , . . . , K can be deterministic or stochastic . Codesusing stochastic encoding functions referred to as stochasticcodes and the ones using deterministic encoding functions arereferred to as deterministic codes. Linear beamforming that maps data signals s BT d × · · · × s BT d K to channel input X BT d . Two beamforming strategiesare used throughout the paper: • Conjugate beamforming : When the BS employs conju-gate beamforming, channel input at j -th channel use of i -th block can be written as X ( i, j ) = K (cid:88) k =1 s k ( i, j ) ˆ H ∗ k ( i ) √ M α k , (8)for any i ∈ { , . . . , B } and j ∈ { T τ + 1 , . . . , T } , where α k (cid:44) E (cid:104) | ˆ H k m | (cid:105) . • δ -conjugate beamforming : We introduce a new beam-forming strategy, called δ -conjugate beamforming that isslightly modiﬁed version of conjugate beamforming. Let δ be a positive real number. When the BS employs δ -conjugate beamforming, the channel input at j -th channeluse of i -th block can be written as X ( i, j ) = K (cid:88) k =1 s k ( i, j ) ˆ H ∗ k ( i ) (cid:112) M δ α k . (9)Note that, when δ = 0 , δ -conjugate beamforming be-comes identical with conjugate beamforming in (8). Decoding functions, g k , k = 1 , . . . , K , where g k maps Y BT d k to ˆ w k ∈ W k . D. Figures of Merit

We deﬁne the average error probability of code (cid:0) BT R , . . . , BT R K , BT d (cid:1) as P e (cid:44) P (cid:32) K (cid:91) k =1 g k ( Y BT d ) (cid:54) = W k (cid:33) , where W k is uniformly distributed on W k . We assume thatthe adversary targets a single user during communication. Thesecrecy of the transmitted message for k -th user is measuredby the equivocation rate at the adversary, which is equal tothe entropy rate of transmitted message w k conditioned onthe adversary’s observations. Deﬁnition 1.

A secure rate tuple R , . . . R K is said to beachievable if, for any (cid:15) > , there exists B ( (cid:15) ) > and asequence of codes (cid:0) BT R , . . . , BT R K , BT d (cid:1) that satisfy thefollowing: P e ≤ (cid:15), (10) BT H (cid:16) W k | Z BT , H B , ˆ H B , H Be (cid:17) ≥ R k − (cid:15) (11) Note that s BT d k (cid:44) { s k ( i, j ) } i =1: B,j = T r +1: T and notation ( · ) BT d applied to any variable has the same meaning. Note that the channel input sequence satisﬁes the following average powerconstraint BT d B (cid:88) i =1 T (cid:88) j = T r +1 E (cid:2) || X ( i, j ) || (cid:3) ≤ ρ f (7)for all w × · · · × w K ∈ W × · · · × W K , where the expectation is overestimated channel gains ˆ H . The inequality (7) follows from the individualpower constraint (6) and from the fact that E (cid:104) ˆ H k ˆ H ∗ l (cid:105) = 0 for k (cid:54) = l or all B ≥ B ( (cid:15) ) and k ∈ { , . . . , K } , where Z BT is thereceived signal sequence at the adversary over BT channeluses. We refer to the constraints in (10) and (11) as decodabilityand secrecy constraints, respectively. We also refer to bothconstraints as security constraints. We call the communicationsystem information theoretically secure if both constraints aresatisﬁed. Notice that the achievable rate tuple deﬁnition aboveis presented for a given M , i.e., M remains constant for asequence of codes (cid:0) BT R , . . . , BT R K , BT d (cid:1) , B ≥ B ( (cid:15) ) .In this paper, we mainly focus on the massive MIMOlimit. Speciﬁcally, we study on how achievable rate tuple R , . . . , R K behaves as M goes to inﬁnity. To that end, weuse the following notion of degrees of freedom for each user. Deﬁnition 2.

A secure degrees of freedom tuple d , . . . , d K is said to be achievable, if there exists achievable rate tuple R , . . . R K such that d k = lim M →∞ R k log M , k = 1 , . . . , K. (12)In the literature, degrees of freedom is typically deﬁned asthe limit lim ρ k →∞ R k log ρ k . Since we aim to understand how R k changes with M under constant ρ k , the degree of freedomdeﬁnition in (12) is more relevant for our interest.For a given achievable secure degrees of freedom tuple d , . . . , d K , we deﬁne the secure degrees of freedom of thedownlink communication as the minimum value in the tuple,i.e., secure DoF (cid:44) min k ∈{ ,...,K } d k . In the rest of thepaper, when we use secure DoF , we mean secure degreesof freedom attained in the presence of an adversary, and whenwe use

DoF , we mean degrees of freedom attained under noadversary.In this paper, we characterize the maximum secure

DoF in the presence of various security attacks described in Sec-tion II-B. Furthermore, we aim to develop defense strategiesthat achieve the maximum secure

DoF against the securityattacks that would limit the maximum secure

DoF to zero,otherwise.III. A

DVERSARY NOT JAMMING T HE T RAINING P HASE

In this section, we show that downlink communication in asingle-cell massive MIMO system is resilient to the adversarythat jams only the data communication phase and eavesdropsboth the communication and training phases. We show thatthe maximum secure

DoF attained under no training-phasejamming is identical to maximum

DoF attained under noadversary. Then, we show that we can establish informationtheoretic security without using stochastic encoding, e.g.,Wyner encoding. Finally, we evaluate the number of antennasthat BS needs to satisfy the security constraints without a needfor Wyner encoding.

A. Resilience of massive MIMO

In this subsection, we evaluate the maximum secure

DoF of the downlink communication in the presence of no training-phase jamming . Then, we show that the maximum secure

DoF attained in the presence of no training-phase jamming is assame as the maximum

DoF attained without an adversary.This result demonstrates the weakness of the no training-phasejamming in the massive MIMO limit.

Theorem 1. (Maximum secure DoF)

For given block length T and data transmission phase length T d , the maximumsecure DoF under no training-phase jamming is given by T d T . (cid:50) The complete proof is available in Appendix A, where weﬁrst provide an upper bound on secure

DoF and then presenta strategy to achieve the upper bound. Here, we provide aproof sketch. In order to ﬁnd an upper bound on secure

DoF , we consider a multiple output single output (MISO)communication system without an adversary, in which theBS communicates to a single user under power constraint ρ f .Further, we assume that the BS and the user have a perfectinformation of the channel gains. We show that the supremumof achievable rates leads to a secure DoF of T d T . Hence, weconclude that T d T is an upper bound on secure DoF attained inthe multi user downlink communication model in Section II.We now describe a strategy to attain the maximum secure

DoF in Theorem 1. On the ﬁrst T r channel uses of each block,the users send pilot signals that are mutually orthogonal.The BS uses minimum mean square estimator (MMSE) toestimate the BS-to-user channel gains. The BS constructs K codebooks, c k , k = 1 , . . . , K , where codebook c k contains BT ˆ R k independently and identically generated codewords, s BT d k of length BT d and ˆ R k > R k . The BS maps k -th user’smessage to a codeword with a stochastic mapping function f k . Speciﬁcally, the BS maps message w k ∈ (cid:8) , . . . , BT R k (cid:9) to randomized message m k ∈ { , . . . , BT ˆ R k } as in [10] andthen maps randomized message m k to one of the codewordsin c k , k = 1 , . . . , K . Utilizing the conjugate beamformingin (8), the BS maps K codewords, s BT d k , k = 1 , . . . , K tochannel input sequence X BT d . Each user employs typical setdecoding [12]. In order to show that secrecy constraint (11)for a particular user is satisﬁed, we give the adversary theother users’ transmitted codewords. (cid:50) In the next couple of remarks, we emphasize the robustnessof the downlink communication system against no training-phase jamming.

Remark 2. (The weakness of the adversary not jammingthe training phase)

In the proof of Theorem 1, we show that T d T is indeed an upper bound on the DoF of a downlinkcommunication without the presence of an adversary. Hence,with also showing that the secure

DoF of T d T is attained inthe presence of the adversary, we conclude that no training-phase jamming attack does not degrade the performance of thecommunication in terms of DoF . The reason that secure

DoF of T d T is achieved is that the adversary keeps silent during thetraining phase; hence the estimated BS-to-user channel gainsare independent with H e . ig. 2. The variation of R k with M and M e In the next section, we consider an adversary jamming thetraining phase. In the presence of such an adversary, the BS-to-user channel gains become correlated with H e and themaximum secure DoF is reduced to zero.

Remark 3. (Resource race between the BS and theadversary)

In Appendix A, we show that the achievablerate tuple that leads to a secure

DoF of T d T is R k = T d T log (cid:16) Mρ k aρ f + ρ jam +1 (cid:17) − T d T log (1 + M e ρ k ) , k = 1 , . . . , K ,where a (cid:44) ρ r T r ρ r T r +1 .We next investigate how R k varies in M e and M . Figure 2illustrates this variation when ρ k = 1 , ρ f = 10 , T d T = 0 . , ρ jam = 1 , and a = 0 . . As seen in Figure 2, in the presence ofthe adversary not jamming the training phase, the achievablesecure rates are determined as a result of the arms racebetween the adversary and the BS. Speciﬁcally, we canobserve that if M e remains constant, achievable rate R k growsunboundedly as M is increasing. Moreover, for a ﬁxed valueof M , the achievable rates decrease as a function of M e .In the next section, we consider an adversary jamming thetraining phase instead of keeping silent during the trainingphase. We will show that, armed with only a single antenna,the adversary is capable of limiting the maximum achievablerate for any user to zero as M → ∞ . Hence, by jamming thetraining phase, the adversary converts the arms race betweenthe BS and itself to the one between an user and itself.

B. Establishing security without Wyner encoding

In the achievability strategy given in the proof sketch ofTheorem 1, we use a stochastic encoding, a randomizedmapping of each message to a codeword with stochasticfunctions, at the BS. In fact, stochastic encoding, e.g., Wynerencoding [10], is a standard technique in the literature forestablishing information theoretic security against the eaves-dropping attacks. In this section, we show that the BS utilizing determin-istic encoding, a nonrandom mapping of each message to acodeword with deterministic functions, instead of stochasticencoding is capable of satisfying the security constraints if itis equipped with sufﬁciently large number antennas. In order tosatisfy the security constraints without using stochastic encod-ing, the BS employs novel beamforming strategy introducedin (9).The following theorem shows that, when code (cid:0) BT R , . . . , BT R K , BT d (cid:1) of length BT d utilizes δ -conjugate beamforming instead of conjugate beamformingin (8), the code satisﬁes the secrecy constraint in (11) forany k ∈ { , . . . , K } and for any (cid:15) > without a need forstochastic encoding. Theorem 2. (Establishing secrecy with no stochastic en-coding)

Let δ > . Under no training-phase jamming,for any (cid:15) > , if M ≥ S ( (cid:15) ) , then any determinis-tic code (cid:0) BT R , . . . , BT R K , BT d (cid:1) employing δ -conjugatebeamforming satisﬁes BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) ≥ R k − (cid:15) (13) for all B ≥ and for all k ∈ { , . . . , K } , where S ( (cid:15) ) (cid:44) (cid:32) M e ρ max TTd (cid:15) − (cid:33) δ and ρ max (cid:44) max k ∈{ ,...,K } ρ k . (cid:50) We can consider S ( (cid:15) ) in Theorem 2 as the number of theantennas the BS needs in order to make the conditional entropy (cid:15) -close to R k for all k ∈ { , . . . , K } . Hence the BS equippedwith at least S ( (cid:15) ) antennas can satisfy (13) by harnessing anycode (cid:0) BT R , . . . , BT R K , BT d (cid:1) that employs deterministicencoding functions and δ -conjugate beamforming.The proof is available in Appendix B-A. The BS constructs K codebooks, c k , k = 1 , . . . , K , where codebook c k contains BT R k codewords, s BT d k of length BT d . The BS maps message w k to s BT d k codeword with a deterministic function, f k , k =1 , . . . , K . Utilizing the δ -conjugate beamforming in (9), theBS maps K codewords, s BT d k , k = 1 , . . . , K to channel inputsequence X BT d .In Figure 3, we illustrate the variation of S ( (cid:15) ) with (cid:15) when ρ k = 1 , δ = 0 . , T /T d = 5 / , and M e = 1 . As seen inFigure 3, antennas at the BS are sufﬁcient to make theequivocation rate above R k − . for any choice of R k , k =1 , . . . , K .Theorem 2 evaluates the number of antennas needed inorder to satisfy only the secrecy constraint. The followingcorollary takes both the secrecy and decodability constraintsinto account. Corollary 1. (Any rate tuple is achievable with no needto stochastic encoding)

Let < δ < . In the presence of notraining-phase jamming, for any (cid:15) > and any rate tuple R (cid:44) [ R , . . . , R K ] , if M ≥ max ( V ( R ) , S ( (cid:15) )) , there exists B ( (cid:15) ) > .02 0.04 0.06 0.08 0.110 S ( ε ) ε Fig. 3. The variation of S ( (cid:15) ) with (cid:15) when ρ k = 1 , δ = 0 . , T/T d = 5 / ,and M e = 1 . As long as M ≥ S ( (cid:15) ) , BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) remains (cid:15) -neighborhood of R k for any k ∈ { , . . . , K } . and sequence of codes (cid:0) BT R , . . . , BT R K , BT d (cid:1) , B ≥ B ( (cid:15) ) that satisfy the constraints in (10) and (11) without theuse of stochastic encoding , where V ( R ) (cid:44) max k ∈{ ,...,K } (cid:18)(cid:16) R k TTd − (cid:17) × ρ f + ρ jam + 1 aρ k (cid:19) − δ . (cid:50) The proof of Corollary 1 can be found in Ap-pendix B-B. The sequence of codes in Corollary 1 utilizes δ -conjugate beamforming. Figure 4 illustrates the variationof max ( V ( R ) , S ( (cid:15) )) with δ when (cid:15) = 0 . , T /T d = 5 / , M e = 1 , ρ k = 1 , and R k = 0 . for any k ∈ { , . . . , K } . Notethat, for these parameters, max ( V ( R ) , S ( (cid:15) )) is minimizedand identical to when δ = 0 . . When the BS utilizes δ -conjugate beamforming with δ = 0 . , the BS requires atleast antennas in order to satisfy the constraints in (10) and(11) without the need for a stochastic encoding (e.g., Wynerencoding). Remark 4. (Achieving secure DoF arbitrarily close themaximum DoF with no Wyner encoding)

Theorem 2 andCorollary 1 show that it is possible to establish informationtheoretic security without using stochastic encoding. We nextmeasure the amount of

DoF sacriﬁced as a result of notutilizing stochastic encoding. To that end, we evaluate hownumber of antennas at the BS max( V ( R ) , S ( (cid:15) )) scales with R k for given (cid:15) > and { R l } l (cid:54) = k . Speciﬁcally, we calculate lim R k →∞ R k log max( V ( R ) ,S ( (cid:15) )) as lim R k →∞ R k log max( V ( R ) , S ( (cid:15) )) = lim R k →∞ R k log V ( R ) (14) = lim R k →∞ R k log (cid:16)(cid:16) R k TTd − (cid:17) × ρ f + ρ jam +1 aρ k (cid:17) − δ (15) δ m a x ( V ( R ) , S ( ǫ )) Fig. 4. The variation of max ( V ( R ) , S ( (cid:15) )) with δ when (cid:15) = 0 . , T/T d =5 / , M e = 1 , ρ k = 1 , and R k = 0 . for any k ∈ { , . . . , K } . As longas M ≥ max( S ( (cid:15) ) , V ( R )) , constraints in (10) and (11) are satisﬁed for agiven (cid:15) and R without a need for stochastic encoding. = lim R k →∞ (1 − δ ) R k log (cid:16) R k TTd − (cid:17) = (1 − δ ) T d T (16) for all k ∈ { , . . . , K } , for any (cid:15) > and for any { R l } l (cid:54) = k .The equalities in (14) and (15) in the above derivation followfrom the fact that max( V ( R ) , S ( (cid:15) )) and V ( R ) are increasingfunctions of R k . We observe from (16) that by choosing δ close to , we can make the difference between (16) andthe maximum secure DoF provided in Theorem 1 arbitrarilysmall.

IV.

ADVERSARY JAMMING THE TRAINING PHASE

In the previous section, we show that the adversary notjamming during the training phase does not degrade theperformance of the multi user communication when the BShas sufﬁciently large number of antennas. In this section, weaim to ﬁnd attack model that do degrade the performance.Speciﬁcally, we focus on ﬁnding an attack strategy capableof limiting secure

DoF to an arbitrarily small value. Nexttheorem sheds light on ﬁnding such an attack strategy.

Theorem 3. (A non-zero correlation between the estimateduser channel and the adversary channel gains limits themaximum secure DoF to zero)

Assume that there exists user k such that • (cid:110) ˆ H k m H e m (cid:111) m ≥ is an i.i.d random process. • For any B ≥ , there exists a random vector ˜ H Bk thatsatisﬁes the following: 1) the joint probability distributionof H Be , ˆ H B is identical with that of H Bk , ˜ H B , where ˜ H B (cid:44) ˆ H B , . . . , ˜ H Bk , . . . , ˆ H BK and 2) the joint probabilitydistribution of H ( i ) , ˜ H ( i ) is identical for any i ∈ [1 : B ] .Then, the maximum secure DoF is zero if E (cid:104) H ∗ e m ˆ H k m (cid:105) (cid:54) = 0 . (cid:50) Note that random vector ˜ H B is created by replacing ˆ H Bk in ˆ H B with ˜ H Bk . The proof of Theorem 3 can be found H e m is the gain of the connecting m -th antenna at the BS to the adversary. n Appendix C-A. In the example given at the end of thissection, we show that the assumptions listed in Theorem 3,that are related to the random variables hold when MMSEand mutually orthogonal pilot signals are used as a channelestimation strategy. Note that such an estimation strategy isquite popular in the multi-user communication [4].We next give a proof sketch. Assume that conjugate beam-forming is used at the BS and M e = 1 . Note that Theorem3 is also valid when the BS uses δ -conjugate beamformingand M e > . We can convert the communication set-upexplained in Section II to an identical set-up containing a BSequipped with K antennas, where the channel input signal at l -th antenna in the new set-up represents the data signal for l -thuser S l , l = 1 , . . . , K . Since conjuagte beamforming is used,the gain of the channel connecting l -th antenna to i -th user inthe new set-up is H i ˆ H ∗ l √ Mα l and the gain of the channel connecting l -th antenna to the adversary in the new set-up is H e ˆ H ∗ l √ Mα l , i, l = 1 , . . . , K . Following the assumptions in Theorem 3, weshow that the gain of the channel connecting the BS to the ad-versary can be replaced with H k ˆ H ∗ √ Mα , . . . , H k ˜ H ∗ k √ Mα k , . . . , H k ˆ H ∗ K √ Mα K .In Appendix C-A, we bound R k as follows R k ≤ E (cid:20)(cid:20) max Σ ∈ S (log (1 + A k Σ A ∗ k ) − log (1 + A e Σ A ∗ e ))] + (cid:105) , (17)where A k (cid:44) (cid:104) H k ˆ H ∗ √ Mα , . . . , H k ˆ H ∗ k √ Mα k , . . . , H k ˆ H ∗ K Mα K (cid:105) is × K com-plex gain vector of channels connecting the BS to k -th user,and A e (cid:44) (cid:104) H k ˆ H ∗ √ Mα , . . . , H k ˜ H ∗ k √ Mα k , . . . , H k ˆ H ∗ K √ Mα K (cid:105) is × K complexgain vector of channels connecting the BS to the adversary. Let Σ be the covariance matrix of input signal S = [ S , . . . , S K ] and S be the feasible set for the maximization problem in (17).Every matrix Σ in set S is diagonal due to fact that S , . . . , S K are independent, and satisfy Σ (cid:22) diag ( ρ , . . . , ρ k ) due to thepower constraint in (6).We show that, if E (cid:2) H ∗ k m H e m (cid:3) (cid:54) = 0 , then the right handside (RHS) of (17) over log M goes to zero as M → ∞ .Hence, the maximum secure DoF becomes zero. (cid:50)

Remark 5. ( Adversary has to jam the trainingphase ) When the adversary does not jam the trainingphase, ˆ H k and H e are independent and consequently E (cid:104) ˆ H k m H ∗ e m (cid:105) = E (cid:104) ˆ H k m (cid:105) E (cid:2) H ∗ e m (cid:3) = 0 for all k ∈ { , . . . , K } . In order to have a non-zero correlationbetween the gain of the channel connecting itself to the BS H e with ˆ H k for any k ∈ { , . . . , K } , the adversary has tojam the training phase. Hence, the training-phase jamming iscapable of limiting the maximum DoF to zero. (cid:50)

In addition to limiting the maximum secure

DoF to zero,the adversary can make the maximum achievable rate of k -th user arbitrarily small as M → ∞ . We next provide theconditions under which the maximum achievable rate of k -th user goes to a ﬁnite value as M → ∞ . Corollary 2. (A user’s maximum achievable rate is boundedas M → ∞ ) In addition to the assumptions given in Theo-rem 3, assume that there exits a ﬁnite non negative r such that p K M ( x ) ≤ r for all M ≥ and x ∈ K M , where p K M is theprobability density function of K M (cid:44) M || H e ˆ H ∗ k || and K M is the sample space of K M . Then, the achievable rate of k -thuser is bounded as lim M →∞ R k ≤  log  (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) E (cid:104) H e m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12)  + . (cid:50) The proof of Corollary 2 can be found in Appendix C-B.As seen in Corollary 2, if the amount of correlation betweenthe BS-to- k -user channel gain and the estimated BS-to- k -userchannel gain, (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) is smaller than that between theBS-to-adversary channel gain and estimated BS-to- k -th userchannel gain, (cid:12)(cid:12)(cid:12) E (cid:104) H e m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) , the maximum achievable rateof k -th user vanishes as M → ∞ . Remark 6. (Resource race between the adversary and theuser)

We show that if there exists a non zero correlation be-tween the BS-to- k -user channel gain and the BS-to-adversarychannel gain, then the maximum secure DoF is constrainedto zero. Furthermore, we also show that if the amount ofthis correlation is higher than the amount of the correlationbetween the BS-to-adversary channel gain and estimated BS-to- k -user channel gain, the maximum achievable rate of k -thuser goes to zero as M → ∞ . Hence, in the presence of the training-phase jamming,the achievable rates and the maximum secure DoF aredetermined as a result of the arms race between theadversary and users. (cid:50)

Example 1. (Using MMSE and mutually orthogonal pilotsignals for channel estimation)

We study an adversary thatchooses to match k -th user’s pilot signal on the training phasewith one of its antennas when MMSE and mutually orthogonalpilot signals are used for channel estimation. We show thatthe assumptions given in Theorem 3 are valid under such ajamming attack and a channel estimation strategy. Then, weshow that the maximum secure DoF is zero.We consider mutually orthogonal pilot signals { φ l } l ∈ [1: K ] ,i.e., φ k × φ ∗ l = (cid:40) T r ρ r if k = l if k (cid:54) = l for any k, l ∈ { , . . . , K } . The received signals at the BS inthe training phase of i -th block is as follows: Y T r = (cid:114) ρ jam ρ r H T e ( i ) φ k + K (cid:88) l =1 H T l ( i ) φ l + W, where ρ jam is the jamming power. Note that we assume thathe adversary jams the data communication phase and thetraining phase with the same power, which is ρ jam .In order to validate the assumptions listed in Theorem 3,we next present the estimated gain of the channel connectingthe BS to l -th user at i -th block as ˆ H l ( i ) = (cid:40) aH l ( i ) + bH e ( i ) + cV l if l = kdH l ( i ) + eV l if l (cid:54) = k, where V l is distributed as CN (0 , I M ) for any l ∈{ , . . . , K } , a (cid:44) T r ρ r T r ρ r +1+ T r ρ jam , b (cid:44) T r √ ρ r ρ jam T r ρ r +1+ T r ρ jam , c (cid:44) √ T r ρ r T r ρ r +1+ T r ρ jam , d (cid:44) T r ρ r T r ρ r +1 and e (cid:44) √ T r ρ r T r ρ r +1 .Deﬁne ˜ H Bk stated in Theorem 3 as ˜ H k ( i ) (cid:44) bH k ( i ) + aH e ( i ) + cV k , i = 1 , . . . , B . Further, deﬁne ˆ H l (cid:44) aH l + bH e + cV l if k = l , and otherwise, ˆ H l (cid:44) dH l + eV l . Notethat ˜ H k ( i ) , H ( i ) , H e ( i ) , ˆ H ( i ) is an i.i.d process due to (5) and the associated joint distribution is identical with that of ˜ H k , H, H e , ˆ H , where ˜ H k (cid:44) aH k + bH e + cV k . Hence, weconclude that the joint probability distribution of H ( i ) , ˜ H ( i ) is identical for any i ∈ { , . . . , B } .We next show that the probability distribution of H Be , ˆ H B is identical with that of H Bk , ˜ H B . Note that both ( H e , ˆ H k ) and ( H k , ˜ H k ) are independent with { ˆ H l } l (cid:54) = k . Hence, notingthat H e and H k have same probability distributions, it issufﬁcient to show that ˜ H k | H k = h k has the same probabilitydistribution with ˆ H k | H e = h k for any h k ∈ R M : P (cid:16) ˜ H k ≤ x | H k = h k (cid:17) = P ( bh k + aH e + cV k ≤ x | H k = h k )= P ( bh k + aH e + cV k ≤ x ) (18) = P ( bh k + aH k + cV k ≤ x ) (19) = P ( bH e + aH k + cV k ≤ x | H e = h k ) (20) = P (cid:16) ˆ H k ≤ x | H e = h k (cid:17) for any x ∈ R M , where (18) and (20) follow from the fact that H e , H k , and V k are mutually independent and (19) followsfrom the fact that ( H e , V k ) and ( H k , V k ) are identicallydistributed.Finally, note that { H e m , ˆ H k m } m ≥ forms an i.i.d processdue to the fact that H k , H e , V k are mutually independentrandom vectors and each is composed of M i.i.d complexGaussian random variables.Note that E (cid:104) ˆ H ∗ k m H e m (cid:105) = b . Since E (cid:104) ˆ H ∗ k m H e m (cid:105) isnon-zero, we conclude that the maximum secure DoF is zeroby Theorem 3. (cid:50)

V. S

ECURE COMMUNICATION UNDER T RAINING -P HASE J AMMING

In the previous section, we showed that massive MIMOsystems are vulnerable to the training-phase jamming. In thissection, we ﬁrst provide a defense strategy against the training-phase jamming, that expands the cardinality of the set of pilotsignals and keeps the pilot signal assignments hidden from theadversary. Then, we show that utilizing the defense strategy and δ -conjugate beamforming, the BS can satisfy the securityconstraints without using Wyner encoding in the presence ofthe training-phase jamming. Finally, we discuss that relyingonly on the computational cryptography, we can secure thecommunication of pilot signal assignments; hence the entiremassive MIMO communication. A. Counter strategy against training-phase jamming

We ﬁrst describe our defense strategy against training-phasejamming attack. Then, in Theorem 4, we show that the ratioof the achieved rate to the logarithm of number of antennascan be brought arbitrarily close to maximum achievable secure

DoF of T d T with the proposed defense strategy that will beexplained next.The BS constructs pilot signal set Φ containing L mutuallyorthogonal pilot signals, i.e., Φ = { φ , . . . , φ L } , where L islarger than the number of users in the system, L ≥ K . Thus,the number the pilot signals is increased. At the beginning ofeach block, the BS draws K pilot signals from set Φ uniformlyat random and assigns each of them to a different user. Let Φ K ( i ) = [ φ ( i ) , . . . , φ K ( i )] be K pilot signals that the BSpicks at the beginning of i -th block, where φ k ( i ) ∈ Φ is thepilot signal assigned to k -th user on i -th block.Throughout sections V-A and V-B, we assume that the BScommunicates to the users the assignments of pilot signalsreliably while keeping the assignments hidden from the ad-versary. In Section V-C, we discuss how this can be achieved.In particular, we consider computational cryptography as away to communicate the pilot signal assignments and discussthe notion of security achieved.We next describe the attack model in detail, under the lackof knowledge of the pilot signal assignments. Suppose thatthe adversary targets k -th user without loss of generality. Theadversary eavesdrops the entire communication between user k and the BS and simultaneously jams data communicationphase with Gaussian noise as in no training-phase jammingattack model. Furthermore, the adversary, without knowingwhich pilot signal is assigned to which user , picks J ≤ L pilot signals uniformly at random from set Φ at the beginningof a block and subsequently jams these pilot signals with anequal power during the training phase. The adversary repeatsthis process independently at the beginning of each block.Particularly, the adversary divides its jamming power andtransmits an equally weighted combination of J randomlyselected pilot signals using all of its M e antennas with totaltransmission power ρ jam J . The signal received by the BSduring the training phase under this attack model can bewritten as follows: Y T r = K (cid:88) l =1 H T l φ l + (cid:88) l ∈ J M e (cid:88) n =1 (cid:114) ρ jam M e Jρ r H T e n φ l + W, (21)where Y T r denotes M × T r complex matrix of the receivedsignals over T r channel uses at the BS, H e n is × M complexgain vector of the channel connecting n -th antenna at theadversary to the BS, and J is the set of pilot signals that areelected and transmitted by the adversary at the correspondingblock. Note that J is a random set that can possibly change ineach block and | J | = J .Next theorem shows that when the cardinality, L of pilotsignal set is increased as a function of the number of BSantennas in a certain way, the ratio of attained secure rate to log M for any user can be arbitrarily close to the maximum DoF attained in the presence of no adversary.

Theorem 4. (Achievable rate under training-phase jamming)

For given block length T and data transmission phase length T d , the achievable secure rate, R k under training-phase jam-ming satisﬁes R k log M ≥ T d T min(1 , γ ) − (cid:15) (22) for any k ∈ { , . . . , K } , J ∈ { , . . . T r } , (cid:15) > , and γ > if max( M γ , K ) ≤ T r and M ≥ G ( (cid:15) ) , where G ( (cid:15) ) (cid:44) (cid:18)(cid:18) M e ρ max + M e ρ max ρ jam ρ r (cid:19) × ( ρ f + ρ jam + 1) × ρ r + ρ jam + 1 ρ min ρ r (cid:19) TdT(cid:15) , (23) ρ max (cid:44) max k ∈{ ,...,K } ρ k , and ρ min (cid:44) min k ∈{ ,...,K } ρ k . (cid:50) Note that the lower bound to R k log M in (22) does notdepend on how many pilot signals the adversary choosesto contaminate. The proof of Theorem 4 can be found inAppendix D. Remark 7. (Attained R k log M is arbitrarily close to maximum DoF under no attack)

We can observe from the statement ofTheorem 4 that when γ = 1 , R k log M that is arbitrarily close tothe maximum DoF attained under no attack can be achieved.In order to attain that amount of R k log M , the length of thetraining phase T r is expanded so that T r ≥ max( K, G ( (cid:15) )) forgiven (cid:15) > and the size of pilot signal set is set to T r insteadof K . Hence, we sacriﬁce the some of secure throughput byincreasing the training overhead. However, as illustrated inthe next example, the typical values for the block lengths formobile wireless communication systems is sufﬁciently large tokeep the overhead ratio, T r T reasonably low. Example 2.

In this example, we consider massive MIMOdownlink transmission to users moving at a speed 10 m/sand the transmitted signal bandwidth is 10 MHz, centeredat 1 GHz The associated coherence time corresponds T toas × channel uses. We ﬁrst evaluate the number ofantennas required to keep R k in (cid:15) neighborhood of T d T fora given training phase length T r . To that end, we plot thevariation of G ( (cid:15) ) with (cid:15) in Figure 5, when γ = 1 , T d = 2 × channel uses, p jam = 1 , K = 5 , p f = 5 , M e = 1 , p r = 10 , p k = 1 for all k ∈ { , . . . , K } . For these set of parameters,200 antennas are sufﬁcient to keep R k log M larger than T d T − . ,where T d T = .Next we study the trade-off between (cid:15) and T d T for given ǫ G ( ǫ ) Fig. 5. The change of G ( (cid:15) ) with (cid:15) T r /T T d T ǫ Fig. 6. The change of (cid:15) in (22) and T d T with T r T M , where (cid:15) is the deviation of achieved R k log M from T d T as in (22) . To that end, we plot the variation of (cid:15) and T d T with T r T for M = 200 as we change T r T from × to . The valuesof parameters p jam , K , M e , ρ k , and ρ f are kept same asstated above and we set ρ r = T d T r ρ f . As seen in Figure 6, (cid:15) vanishes as T r goes to T and hence R k log M also gets closer to T d T . However, as T r increases, the training overhead increasesand hence maximum DoF T d T decreases. Remark 8. (Resource race between the adversary and theBS)

By keeping the pilot assignments hidden from the adver-sary and using a pilot signal set that scales with M , the BSconverts the arms race between the adversary and the targetuser (which was the case with known pilot assignments),back to the one between the adversary and itself. Indeed, thepower of the adversary needs to scale with L for it to makean impact.B. Establishing security without Wyner encoding In this subsection, we show that the BS, when utilizingdeterministic encoding instead of stochastic encoding is stillcapable of satisfying the secrecy and decodability constraintsin the presence of training-phase jamming. Hence, this sub-section can be considered as the counterpart of Section III-B.There, we assumed no training phase jamming, whereas herewe mitigate training-phase jamming by other means.In order to satisfy the security constraints without usingstochastic encoding, the BS employs δ -conjugate beamforminggiven in (9) and the strategy explained in Section V-A.Speciﬁcally, Theorem 5 and Corollary 3 provide the numberf antennas that the BS requires in order to satisfy onlythe secrecy constraint and both the secrecy and decodabilityconstraints, respectively. Note that Theorem 5 and Corollary 3are the counterparts of Theorem 2 and Corollary 1. Theorem 5. (Establishing secrecy with no stochastic encod-ing)

Let δ , γ > , and γ + δ > . Let block length be T and length of data transmission phase be T d . In the presenceof training-phase jamming, for any (cid:15) > and any rate tuple R (cid:44) [ R , . . . , R K ] , if M ≥ S ( (cid:15) ) and T r ≥ max( M γ , K ) ,then any deterministic code (cid:0) BT R , . . . , BT R K , BT d (cid:1) em-ploying δ -conjugate beamforming satisﬁes BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) ≥ R k − (cid:15) (24) for any J ∈ { , . . . , T r } , B ≥ , and k ∈ [1 : K ] , where S ( (cid:15) ) (cid:44)  ρ max M e max (cid:16) , ρ jam ρ r (cid:17) TTd (cid:15) −  δ,δ + γ − and ρ max (cid:44) max k ∈{ ,...,K } ρ k . (cid:50) The proof of Theorem 5 can be found in Appendix E-A.Note that when γ = 1 and ≥ ρ jam ρ r , the necessary number ofantennas to meet the secrecy constraint under training phaseattack becomes identical to that under no attack. This resultdemonstrates the effectiveness of the defense strategy, hidingthe pilot signal assignments from the adversary and expandingthe pilot signal set .There is a tradeoff between the number, M , of antennasand the length, T r , of the training period necessary to satisfyconstraints M ≥ S ( (cid:15) ) and T r ≥ max( M γ , K ) . This tradeoffis controlled by parameter γ . While choosing γ close to minimizes S ( (cid:15) ) for any (cid:15) > , it increases the length of thetraining period, i.e., the overhead. To observe this: First, S ( (cid:15) ) is minimum at γ = 1 due to the fact that min( δ, δ + γ − ≤ δ and equality occurs when γ = 1 . Second, increasing γ to also increases training overhead as T r has to be larger than S ( (cid:15) ) γ .In Theorem 5, we provide the number of antennas requiredto satisfy only the secrecy constraint. Next corollary presentsthe number of antennas that BS needs in order to satisfy boththe secrecy and the decodability constraints without need forstochastic encoding. Corollary 3. (Any rate tuple is achievable with no needfor stochastic encoding)

Let < δ < , γ + δ > . Letblock length be T and length of data transmission phase be T d and J be any integer in { , . . . , T r } . In the presenceof training-phase jamming, for any (cid:15) > and any ratetuple R (cid:44) [ R , . . . , R K ] , if M ≥ max ( V ( R ) , S ( (cid:15) )) and T r ≥ M γ , then there exists B ( (cid:15) ) > and a sequence ofcodes (cid:0) BT R , . . . , BT R K , BT d (cid:1) , B ≥ B ( (cid:15) ) that satisfy theconstraints in (10) and (11) without a need for stochastic encoding , where V ( R ) (cid:44) max k ∈{ ,...,K } (cid:18)(cid:16) R k TTd − (cid:17) × ( ρ f + ρ jam + 1) × ( ρ r + ρ jam + 1) ρ r ρ k (cid:19) − δ . (cid:50) The proof of Corollary 3 can be found in Appendix E-B.

C. How do we hide the pilot signal assignments?

So far, we have demonstrated that, if pilot signal assign-ments can be kept secret from the adversary, the impact oftraining-phase jamming can be mitigated by increasing thecardinality of the pilot signal set at the expense of someincrease in training overhead. Next, we discuss how to keepthe assignments secret from the adversary.In order to communicate the pilot signal assignments se-curely, at the beginning of each block, the BS shares witheach user a secret key of size log L bits, that is unknownto the adversary. In the literature, by far the most popularway to generate an information-theoretically secure secretkey across a wireless channels is via the use of reciprocalchannel gains [17], [18], [19]. However, we cannot use suchchannel-gain based methods, since for those methods weneed to observe the channel gains. However, our objectiveof generating the keys is to secure the training phase, whosesole purpose is to observe the channel gains in the ﬁrst place,leaving us with a “chicken or the egg” dilemma.With this observation, let us consider the methods in whichthese keys are generated and shared by standard private keybased methods (e.g., Difﬁe-Hellman [15]) or public key basedmethods (e.g., RSA [16]). Thus, it only relies on existingstandard computational cryptographic techniques and doesnot rely on information-theoretic techniques for secure keysharing. Note that a shared key between the BS and a useris used to encrypt the pilot signal assigned to that userand the encrypted assignment is communicated to the usersimmediately after key sharing.Despite the use of computational cryptographic methods forkey generation, the security we provide has the “same ﬂavor”as information theoretic secrecy, as we clarify next. The maindrawback of computational cryptographic methods such asDifﬁe-Hellman is that, they make assumptions on the com-putational power of the adversaries. This kind of security isbased on the supposition that, given that the key is hidden froman adversary via a difﬁcult puzzle , it takes an unreasonableamount of time for an adversary to crack it. Nevertheless, givenenough time, the adversary will eventually decrypt the message(possibly quickly, given a quantum computer, for instance).This constitutes the main motivation for information-theoreticsecurity, which makes no assumptions on the computationalpowers of the attackers. For example, RSA is based on an NP problem: prime factorization of alarge number. n our approach, we have a hybrid scheme, combininginformation theoretic security and computational cryptogra-phy. We are using cryptography to hide the pilot sequenceassignments, not the message . Encrypting the pilot signalassignments is fundamentally different from encrypting themessage. In message encryption, the signal received by theadversary remains vulnerable to cryptanalysis, long after themessage is transmitted. On the other hand, with pilot signalassignment encryption, this window of time for cryptanalysiscan be arbitrarily small: unless the adversary ﬁgures out thepilot sequence assigned to the targeted user before the trainingphase starts, the knowledge of the assignment becomes useless.But, we know that the training phase starts immediately afterthe encrypted assignment is communicated to the users. If wedeﬁne the computational power required for the adversary asthe ratio of amount of computation needed to decrypt the keyvia cryptanalysis to the time required to solve the problem,the computational power necessary for the adversary to makea damage on the targeted user goes to inﬁnity. This addressesthe shortcoming of existing cryptographic methods due to theirassumptions on computational powers of adversaries. Notethat, if the adversary cannot act during the training phase,the message transmission is “perfectly secure” as shown inTheorem 4.It is important to emphasize that, in the above discussion,we did not show that the aforementioned defense strategyachieves information-theoretic security. Instead, we arguedthat, utilizing our defense strategy of encrypting training sig-nals, we can avoid one of the main drawbacks of the existingcomputational-cryptographic methods, i.e., assumptions on thecomputational power of adversaries.VI. C

ONCLUSION

In this work, we study the physical-layer security of mas-sive MIMO downlink communication. We ﬁrst consider notraining-phase jamming attack in which the adversary jams only the data communication and eavesdrops both the datacommunication and training. We show that secure

DoF at-tained in the presence of no training-phase jamming is assame as the

DoF attained under no attack. This result showsthe resilience of the massive MIMO against adversaries notjamming the training phase. Further, we propose a joint powerallocation and beamforming strategy, called δ -conjugate beam-foming, using which we can establish information theoreticsecurity without even a need for Wyner encoding as long asthe number of antennas is above a certain threshold, evaluatedin the sequel.We next show the vulnerability of massive MIMO systemsagainst the attack, called training-phase jamming in which theadversary jams and eavesdrops both the training and data com-munication. We show that the maximum secure DoF attainedin the presence of training-phase jamming is zero. We thendevelop a defense strategy against training-phase jamming.We show that if the BS keeps the pilot signal assignmentshidden from the adversary and extends the cardinality of thepilot signal set, a secure

DoF equal to the maximum

DoF attained under no attack can be achieved. We ﬁnally providea discussion why standard computational-cryptographic keysharing methods can be considered as strong candidates toencrypt the pilot signal assignments and how they achieve alevel of security that is comparable to information-theoreticallysecure key-generation methods.A

PPENDIX AP ROOF OF T HEOREM C = max p ( x Td | h ,y Tr ) , E [ tr ( X Td X Td ∗ )] ≤ ρ f T d T I (cid:16) X T d ; Y T d | Y T r , H (cid:17) (25) = max p ( x Td | h ) ,E [ tr ( X Td X Td ∗ )] ≤ ρ f T d T I (cid:16) X T d ; Y T d | H (cid:17) (26) = max p ( x | h ) , E [ tr ( XX ∗ )] ≤ ρ f T d T I ( X ; Y | H ) (27) = max E [ P ( H )] ≤ ρ f T d T E (cid:2) log (cid:0) P ( H ) || H || (cid:1)(cid:3) (28)where X T d is a complex T d × M matrix and P ( · ) : C M → R + ∪ { } is a power allocation function. In the derivationabove, (25) follows from Section 7.4.1 of [20] where thecapacity of a communication system in which the channelgains are available at both encoder and decoder is stated. Theequality in (26) follows from the fact that Y T r → X T d , H → Y T d forms a Markov chain and the equality in (27) followsfrom the fact that I (cid:16) X T d ; Y T d | H (cid:17) ≤ T d (cid:88) i =1 I ( X i ; Y i | H ) (29)and from the fact that the equality is attained in (29) if p X Td | H (cid:0) x T d | h (cid:1) = (cid:81) T d i =1 p X | H ( x i | h ) . Then, the RHS andthe LHS of (29) becomes I ( X ; Y | H ) .In (28), the equality follows from Section of [21], wherethe capacity of MISO system is evaluated. In [21], the powerallocation function maximizing (28) is given as P ( h ) = (cid:18) λ M − || h || (cid:19) + , here λ M is a non-negative real number and is chosen suchthat E [ P ( H )] = ρ f . We next ﬁnd an upper bound on λ M with the following analysis: ρ f = E (cid:34)(cid:18) λ M − || H || (cid:19) + (cid:35) ≥ λ M − E (cid:20) || H || (cid:21) (30) = λ M − M − (31)where (30) follows from the fact that || H || is distributed withinverse Gamma distribution and has a mean of M − . Hence,we have λ M ≤ M − + ρ f for M > . We next bound the DoF of the MISO communication system as lim M →∞ T d T E (cid:2) log (cid:0) P ( H ) || H || (cid:1)(cid:3) log M ≤ lim M →∞ T d T E (cid:2) log (cid:0) λ M || H || (cid:1)(cid:3) log M (32) ≤ lim M →∞ T d T log (cid:0) λ M E (cid:2) || H || (cid:3)(cid:1) log M (33) = lim M →∞ T d T log (1 + λ M M )log M ≤ lim M →∞ T d T log (cid:16) MM − + M ρ f (cid:17) log M (34) = T d T + lim M →∞ T d T log (cid:16) M + M − + ρ f (cid:17) log M = T d T , (35)where (32) follows from the fact that P ( · ) ≤ λ M for allrealizations of H , (33) follows from Jensen’s inequality, and(34) follows from (31). In (35), we show that secure DoF can be at most T d T . (cid:50) Next, we describe an achievability strategy to attain secure

DoF of T d T . Channel estimation : Pilot signals are mutually orthogonal,i.e., φ k × φ ∗ l = (cid:40) T r ρ r if k = l if k (cid:54) = l for any k, l ∈ { , . . . , K } . The BS employs MMSE for chan-nel estimation. The estimated gain of the channel connectingthe BS to k -th user is as follows: ˆ H k = aH k + bV k (36)for k ∈ { , . . . , K } , where a (cid:44) ρ r T r ρ r T r +1 , b (cid:44) √ ρ r T r ρ r T r +1 , and V k is additive Gaussian noise distributed with CN (0 , I M ) .Note that E (cid:104) ˆ H k (cid:105) = 0 × M , E (cid:104) || ˆ H k || (cid:105) = M a . Further,for any k ∈ [1 : K ] and for any m, n ∈ [1 : M ] , E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k n H k m (cid:12)(cid:12)(cid:12) (cid:21) = a + a if m = n , otherwise; E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k m H k n (cid:12)(cid:12)(cid:12) (cid:21) = a . Codebook generation:

Pick R k = T d T log (cid:16) Mρ k aρ f + ρ j +1 (cid:17) − T d T log (1 + M e ρ k ) and ˆ R k = T d T log (cid:16) Mρ k aρ f + ρ j +1 (cid:17) − (cid:15) for some (cid:15) > andfor k = 1 , . . . , K . Generate K codebooks, c k , k = 1 , . . . , K ,where K is the number of users. Codebook c k containsindependently and identically generated codewords, s BT d kl , l ∈ { , . . . , BT ˆ R k } , each is drawn from CN ( , ρ k I BT d ) . Encoding:

In order to send k -th user’s message w k ∈ W k , the encoder draws index l k from theuniform distribution that has a sample space of (cid:110) ( w k −

1) 2 BT ( ˆ R k − R k ) + 1 , . . . , w k BT ( ˆ R k − R k ) (cid:111) . Notethat this mapping makes the encoder stochastic . The encoderthen maps index l k to the corresponding codeword s BT d kl k incodebook c k .The encoder employs a conjugate beamforming to mapcodewords to channel input sequence X BT d . The channelinput at j -th channel use of i -th block can be written asfollows: X ( i, j ) = K (cid:88) k =1 s kl k ( i, j ) 1 √ M α k ˆ H ∗ k ( i ) where α k = a for all k ∈ { , . . . , K } due to the fact that E (cid:104) | ˆ H k m | (cid:105) = a for all k ∈ { , . . . , K } . Decoding

Each user employs typical set decoding [12].Let y BT d k be the received signal at k -th user over BT d channel uses. The decoder at k -th user looks for an uniqueindex l k ∈ (cid:8) , . . . , BT d R k (cid:9) such that (cid:16) s BT d kl k , y BT d k (cid:17) ∈ A BT d (cid:15) (cid:16) S T d k , Y T d k (cid:17) , where A BT d (cid:15) (cid:16) S T d k , Y T d k (cid:17) is the set ofjointly typical sequences ( s BT d k , y BT d k ) with Y T d k = 1 √ M a H k ˆ H ∗ k S T d k + 1 √ M a K (cid:88) j =1 ,j (cid:54) = k H j ˆ H ∗ j S T d j + H jam,k V jam + V k where S T d j is distributed with CN (0 , ρ k I T d ) , j = 1 , . . . , K and V k is distributed with CN ( , I T d ) . Probability error and equivocation analysis

By thechannel coding theorem [12], E [ P e ] → as B → ∞ if ˆ R k < T d T I (cid:16) S T d k , Y T d k (cid:17) , k = 1 , . . . , K , where expectation isover random codebooks, C , . . . , C K . Note that codebook c k is the realization of C k . Deﬁne T (cid:44) √ M a S k E (cid:104) H k ˆ H ∗ k (cid:105) T (cid:44) √ M a S k (cid:16) E (cid:104) H k ˆ H ∗ k (cid:105) − H k ˆ H ∗ k (cid:17) (cid:44) √ M a K (cid:88) j =1 ,j (cid:54) = k H k ˆ H ∗ j S j T (cid:44) H jam,k V jam + V k . Note that E [ T ] = E [ T ] = E [ T ] = E [ T ] = 0 and E [ T T ∗ ] = E [ T T ∗ ] = E [ T T ∗ ] = 0 . We can bound T d T I (cid:16) S T d k , Y T d k (cid:17) as T d T I (cid:16) S T d k , Y T d k (cid:17) ≥ T d T log (cid:18) V ar [ T ] V ar [ T + T + T ] (cid:19) (37) = T d T log (cid:18) V ar [ T ] V ar [ T ] + V ar [ T ] + V ar [ T ] (cid:19) (38) = T d T log (cid:18) M ρ k aρ f + ρ jam + 1 (cid:19) , (39)where (37) follows from Theorem 1 of [13] and (38) followsfrom the fact that T , T , and T are uncorrelated randomvariables. The equality in (39) follows from the fact that V ar [ T ] = M ρ k a , V ar [ T ] = ρ k , V ar [ T ] = (cid:80) j (cid:54) = k ρ j ,and V ar [ T ] = ρ jam + 1 . From (39), we conclude that ˆ R k ≤ T d T I (cid:16) S T d k , Y T d k (cid:17) . Hence, E [ P e ] → as B → ∞ .We next analyze the secrecy constraint in (11). Let H ( W k | Z T B , H B , ˆ H B , H Be , C (cid:17) be the expectation of theconditional entropy in (11) over random codebooks C (cid:44) [ C , . . . , C K ] . We show that the expectation satisﬁes the con-straint in (11) for k -th user with the following analysis: H ( W k | Z BT , G B , C (cid:1) ≥ H ( W k | Z BT , S BT d , G B , C (cid:1) = H ( W k | Z BT d , S BT d , G B , C (cid:1) (40) = H (cid:16) W k , S BT d k (cid:12)(cid:12)(cid:12) Z BT d , S BT d , G B , C (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) ≥ H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) Z BT d , S BT d , G B , C (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z T B , S BT d , G B , C (cid:17) = H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) S BT d , G B , C (cid:17) − I (cid:16) S BT d k ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B , C (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) = H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) C k (cid:17) − I (cid:16) S BT d k ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B , C (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) (41) = BT ˆ R k − I (cid:16) S BT d k ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B , C (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) (42) ≥ BT ˆ R k − I (cid:16) S BT d k , C ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) (43) where G B (cid:44) (cid:104) H B , ˆ H B , H Be (cid:105) . Signal set S BT d (cid:44) (cid:110) S BT d i (cid:111) i (cid:54) = k is deﬁned to be the transmitted codewords ofthe users except k -th user. Signals Z BT r and Z BT d are thereceived signals at the adversary over the training phases anddata communication phases, respectively. Note that Z BT (cid:44) (cid:2) Z BT r , Z BT d (cid:3) .In the above derivation (40) follows from the fact that Z BT r and ( G B , W k , S BT d , Z BT d , C ) are independent, (41)follows from the fact that ( S BT d k , C k ) are independent with ( G B , { C i } i (cid:54) = k ) , and (42) follows from the fact that S BT d k isuniformly distributed on a set of size BT ˆ R k . We continue thederivation as(43) = BT ˆ R k − I (cid:16) S BT d k ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B (cid:17) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) (44) ≥ BT ˆ R k − B (cid:88) i =1 T (cid:88) j = T r +1 I ( S k ( i, j ); Z ( i, j ) | S ( i, j ) , G ( i )) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT d , S BT d , G B , C (cid:17) ≥ BT ˆ R k − B (cid:88) i =1 T (cid:88) j = T r +1 E (cid:34) log (cid:32) ρ k M a M e (cid:88) m =1 (cid:12)(cid:12)(cid:12) ˆ H k H ∗ e m (cid:12)(cid:12)(cid:12) (cid:33)(cid:35) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT , S BT d , G B , C (cid:17) (45) ≥ BT ˆ R k − BT d log (cid:32) ρ k M a M e (cid:88) m =1 E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H k H ∗ e m (cid:12)(cid:12)(cid:12) (cid:21)(cid:33) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT , S BT d , G B , C (cid:17) = BT ˆ R k − BT d log (1 + M e ρ k ) − H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z BT , S BT d , G B , C (cid:17) (46) ≥ BT (cid:18) ˆ R k − T d T log (1 + M e ρ k ) (cid:19) − BT (cid:15) (47) = BT ( R k − (cid:15) ) (48)for any (cid:15) > and sufﬁciently large B , where (cid:15) (cid:44) (cid:15) + (cid:15) and H e m in (45) denotes the gain of the channel connecting m -th antenna at the BS to the adversary. The equality in (44)follows from the fact that C → S T d Bk , S T d B , G B → Z T d B forms a Markov chain. The equality in (46) is due to the factthat E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k H e m (cid:12)(cid:12)(cid:12) (cid:21) = M a , m = 1 , . . . , M e .To get the inequality in (47), we need to bound BT H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z T B , S BT d , G B , C (cid:17) . Deﬁne R e (cid:44) ˆ R k − R k . Note that R e < T I (cid:16) S T d k ; Z T d (cid:12)(cid:12)(cid:12) S T d , G (cid:17) = T d T log (1 + M e ρ k a ) . Hence, as in (52) of [14], utilizing Fano’sinequality and the channel coding theorem, we show that lim B →∞ BT H (cid:16) S BT d k (cid:12)(cid:12)(cid:12) W k , Z T B , S BT d , G, C (cid:17) = 0 .From the fact that E [ P e ] → as B → ∞ and from (48),e conclude that there exists a sequence of codes satisfyingconstraints (10) and (11). We now evaluate degree of freedom d k associated with R k as d k = lim M →∞ R k log M = T d T + lim M →∞ T d T log (1 + M e ρ k )= T d T for k = 1 , . . . , K . Hence, the attained secure DoF is equal to T d T . (cid:50) A PPENDIX B A. Proof of Theorem 2

Note that since the adversary keeps silent during the trainingphases, the received signals at the BS over training phasesare independent with H Be . Hence, we conclude that ˆ H B (cid:44) (cid:104) ˆ H B , . . . , ˆ H BK (cid:105) and H Be are independent.The BS picks message rates R k > , k = 1 , . . . , K .The equivocation rate for a code (cid:0) BT R , . . . , BT R K , BT d (cid:1) utilizing deterministic encoding mapping functions, f k , k =1 , . . . , K and δ -conjugate beamforming is as follows: BT H ( W k | Z BT , G B (cid:1) = 1 BT H ( W k | Z BT d , G B (cid:1) (49) ≥ BT H ( W k | Z BT d , S BT d , G B (cid:1) = 1 BT H ( W k | S BT d , G B (cid:1) − BT I (cid:0) W k ; Z BT d (cid:12)(cid:12) S BT d , G B (cid:1) = R k − BT I (cid:0) W k ; Z BT d (cid:12)(cid:12) S BT d , G B (cid:1) (50) ≥ R k − BT I (cid:16) S BT d k ; Z BT d (cid:12)(cid:12)(cid:12) S BT d , G B (cid:17) (51) ≥ R k − BT (cid:88) i =1 (cid:88) j =1 I ( S k ( i, j ); Z ( i, j ) | S ( i, j ) , G ( i )) ≥ R k − BT B (cid:88) i =1 T (cid:88) j = T r +1 log (cid:32) P k ( i, j ) M δ α k M e (cid:88) m =1 E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k H e m (cid:12)(cid:12)(cid:12) (cid:21)(cid:33) (52) ≥ R k − T d T log (cid:32) ρ k M δ α k M e (cid:88) m =1 E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k H e m (cid:12)(cid:12)(cid:12) (cid:21)(cid:33) (53) = R k − T d T log (cid:18) M e ρ k M δ (cid:19) (54) ≥ R k − (cid:15) for any (cid:15) > and for sufﬁciently large M , where G (cid:44) (cid:104) H B , ˆ H B , H Be (cid:105) . Particularly, for a given (cid:15) > if M ≥ (cid:18) M e ρ k TTd (cid:15) − (cid:19) δ , then there exists a code that satisﬁes the con- straint in (12).In the above derivation, (49) follows from the fact that Z BT r and ( Z BT d , G B , W k ) are independent, (50) follows from thefacts that W k is independent with S BT d , G B and uniformlydistributed on [1 : 2 BT R k ] . In (51), the inequality followsfrom the fact that W k → S BT d k → Z BT d , S BT d , G B .In (52), P k ( i, j ) (cid:44) E (cid:12)(cid:12) | S k ( i, j ) || (cid:3) , where the expectation isover W k . In (53), the inequality follows from Jensen’s inequal-ity and from the fact that BT d (cid:80) Bi =1 (cid:80) Tj = T r +1 P k ( i, j ) ≤ ρ k .In (54), the equality follows from the fact ˆ H k and H e m areindependent and E (cid:20)(cid:12)(cid:12)(cid:12) ˆ H ∗ k H e m (cid:12)(cid:12)(cid:12) (cid:21) = M α k , k = 1 , . . . , K . B. Proof of Corollary 1

Pick < δ < . Pick arbitrary (cid:15) > and rate tuple R =[ R , . . . , R K ] . Let M ≥ max( V ( R ) , S ( (cid:15) )) . Note that inequal-ity M > V ( R ) implies that R k < T d T log (cid:16) M − δ aρ k M − δ ρ f + ρ j +1 (cid:17) , k = 1 , . . . , K . We ﬁrst show that there exists B ( (cid:15) ) > and a sequence of codes (cid:0) BT R , . . . , BT R K , BT d (cid:1) utilizing δ -beamforming and deterministic mapping, that satisfy thedecodability constraint in (10) for B ≥ B ( (cid:15) ) .The same channel estimation strategy in Appendix Ais used. Codebook generation is as same as the one inAppendix A. The BS generates K codebooks, c k , k =1 , . . . , K , where c k contains BT R k codewords, s BT d kl , l ∈{ , . . . , BT R k } .To send k -th user’s message w k ∈ W k = (cid:8) , . . . , BT R k (cid:9) ,the BS maps message w k to the corresponding codeword s BT d kw k in codebook c k . Note that there is no randomizationin the mapping as opposed to the mapping in the encodingin Appendix A, where the codeword is a stochastic functionof the message. The BS employs δ -conjugate beamforming tomap codewords to channel input sequence X BT d . The channelinput at j -th channel use of i -th block can be written as X ( i, j ) = K (cid:88) k =1 s kw k ( i, j ) 1 (cid:112) M δ α k ˆ H ∗ k ( i ) (55)where α k = a and a is deﬁned in (36).The typical set decoding is used at each user as in theproof of Theorem 1 in Appendix A. Hence, since R k < T I (cid:16) S BT d k ; Y BT d k (cid:17) , k = 1 , . . . , K , by the channel codingtheorem, there exists a sequences of codes that satisfy con-straint (10).In addition, since M ≥ S ( (cid:15) ) , the sequence of codesmentioned above satisfy the secrecy constraint in (11) due toTheorem 2. Hence, the proof of Corollary 1 follows. (cid:50) A PPENDIX C A. Proof of Theorem 3

Throughout the proof, we assume that the BS employsconjugate beamforming without loss of generality. Supposethat R k is an achievable rate. From the constraints (10)-(11)nd Fano’s inequality, we have BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) ≥ R k − δ B (56) BT H ( W k | Y BT d k (cid:17) ≤ (cid:15) B (57)where (cid:15) B and δ B go to zero as B → ∞ .The LHS of (56) can be written as follows BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) = 1 BT H (cid:16) W k | Z BT d , ˆ H B , H Be (cid:17) (58) = 1 BT H (cid:16) W k | ˜ Z BT d , ˜ H B , H Bk (cid:17) (59) = 1 BT H (cid:16) W k | ˜ Z BT d , ˜ H B , H B (cid:17) (60)where ˜ H ( i ) (cid:44) (cid:104) ˆ H ( i ) , . . . , ˜ H k ( i ) , . . . , ˆ H K ( i ) (cid:105) and ˜ Z ( i, j ) (cid:44) √ M α k H k ( i ) ˜ H ∗ k ( i ) S k ( i, j )+ K (cid:88) l =1 ,l (cid:54) = k √ M α l H l ( i ) ˜ H ∗ l ( i ) S l ( i, j ) + W (61)for ≤ i ≤ B and T d + 1 ≤ j ≤ T . Theequality in (58) follows from the fact that H B → Z BT d , ˆ H B , H Be → W k forms a Markov chain and the equalityin (59) follows from the fact that the joint distribution of W k , Z BT d , H Be , ˆ H B , . . . , ˆ H Bk , . . . ˆ H BK is identical with that of W k , ˜ Z BT d , H Bk , ˆ H B , . . . , ˜ H Bk , . . . , ˆ H BK . The equality in (60)follows from the fact that H B /H Bk → ˜ Z BT d , ˆ H B , ˜ H B → W k forms a Markov chain.The upper bound on R k can be derived with the followingsteps: R k ≤ BT H ( W k | Z BT d , H B , ˆ H B , H Be (cid:17) − BT H ( W k | Y BT d k (cid:17) + γ B (62) = 1 BT H ( W k | ˜ Z BT d , ˜ H B , H B (cid:17) − BT H ( W k | Y BT d k (cid:17) + γ B (63) ≤ BT H ( W k | ˜ Z BT d , ˜ H B , H B (cid:17) − BT H ( W k | ˜ Z BT d , ˜ H B , H B (cid:17) + γ B (64) = 1 BT I (cid:16) W k ; Y BT d k (cid:12)(cid:12)(cid:12) ˜ Z BT d , ˜ H B , H B (cid:19) ≤ BT I (cid:16) S BT d ; Y BT d k (cid:12)(cid:12)(cid:12) ˜ Z BT d , G B (cid:19) + γ B (65) ≤ BT B (cid:88) i =1 T (cid:88) j = T r +1 I ( S ( i, j ); Y k ( i, j ) | ˜ Z ( i, j ) , G (cid:17) + γ B (66) = (cid:90) BT B (cid:88) i =1 T (cid:88) j = T r +1 I ( S ( i, j ); Y k ( i, j ) | Z ( i, j ) , g ) p G ( g ) d g + γ B , (67)where γ B (cid:44) (cid:15) B + δ B , G B (cid:44) (cid:104) H B , ˜ H B (cid:105) , and G (cid:44) (cid:104) H, ˜ H (cid:105) .In the derivation above, (62) follows from (56) and (57),and (64) follows from the fact that conditioning reduces theentropy. The inequality in (65) follows from the fact that W k → S BT d → Y BT d k , ˜ Z T d , G B . The inequality in (66)follows from the memoryless property of the channel and fromthe assumption in Theorem 3, stating that (cid:16) ˜ H ( i ) , H ( i ) (cid:17) havean identical probability distribution for any i ≥ . We continuethe upper bound derivation with the following steps:(67) ≤ (cid:90) BT B (cid:88) i =1 T (cid:88) j = T r +1 I ( S G ( i, j ); Y k ( i, j ) | ˜ Z ( i, j ) , g (cid:17) p G ( g ) d g + γ B (68) ≤ T d T (cid:90) I ( S G ; Y k | Z, g ) p G ( g ) d g + γ B (69) ≤ T d T (cid:90) (cid:20) max Σ ∈ S (log (1 + c k Σ c ∗ k ) − log (1 + c e Σ c ∗ e ))] + p G ( g ) d g + γ B (70) ≤ T d T E (cid:20) max Σ ∈ S ([log (1 + C k Σ C ∗ k ) − log (1 + C e Σ C ∗ e )] + (cid:17)(cid:105) + γ B , (71)where C k and C e are × K random vectors and are de-ﬁned as C k (cid:44) (cid:104) H k ˆ H ∗ √ Mα , . . . , H k ˆ H ∗ k √ Mα k , . . . H k ˆ H ∗ K √ Mα K (cid:105) and C e (cid:44) (cid:104) H k ˆ H ∗ √ Mα , . . . , H k ˜ H ∗ k √ Mα k , . . . , H k ˆ H ∗ K √ Mα K (cid:105) . Further, c k and c e are therealizations of C k and C e , respectively. Deﬁne Σ ij as K × K covariance matrix of S k ( i, j ) . Note that Σ ij is a diagonalmatrix due to the fact that each component of S ( i, j ) areindependent. The inequality (68) follows from ( ) of [23],where S G ( i, j ) in (68) is distributed with CN (0 K × K , Σ ij ) .Deﬁne f (Σ ij ) (cid:44) I ( S G ( i, j ); Y k ( i, j ) | ˜ Z ( i, j ) , g (cid:17) . The in-equality in (69) follows from Jensen’s inequality and Propo-sition 5 of [23] that states f (Σ ij ) is a concave func-tion of Σ ij . Note that S G in (69) is distributed with CN (cid:16) , BT d (cid:80) Bi =1 (cid:80) Tj = T r +1 Σ ij (cid:17) . The inequality in (70) fol-lows from (139) of [23], where S is a set of covariancematrices and deﬁned as S (cid:44) { Σ : Σ (cid:22) diag ( ρ , . . . , ρ K ) and Σ is a diagonal matrix } (72)We can rewrite the random variable inside the expectation as max Σ ∈ S (cid:18)(cid:20) log (cid:18) M C k Σ C ∗ k (cid:19) − log (cid:18) M C e Σ C ∗ e (cid:19)(cid:21) + (cid:33) (73) max Σ ∈ S  log  M + ρ k ( G ) v k ( G ) + K (cid:88) l (cid:54) = k ρ l ( G ) v l ( G )  − log  M + ρ k ( G ) w k ( G ) + K (cid:88) l (cid:54) = k ρ l ( G ) v l ( G )  +  (74)with probability 1, where ρ k ( G ) is deﬁned to be k -th elementon the diagonal of Σ , i.e., Σ (cid:44) diag ( ρ ( G ) , . . . , ρ K ( G )) .Note that ≤ ρ l ( G ) ≤ ρ l , l = 1 , . . . , K, due to (72). In (74), we deﬁne v l ( G ) (cid:44) α l M (cid:12)(cid:12)(cid:12) H k ˆ H ∗ l (cid:12)(cid:12)(cid:12) for l = 1 , . . . , K and w k ( G ) (cid:44) α k M (cid:12)(cid:12)(cid:12) H k ˜ H ∗ k (cid:12)(cid:12)(cid:12) . We continueto simplify (73) with the following:(74) = (cid:20) max ρ k ( G ):0 ≤ ρ k ( G ) ≤ ρ k (cid:18) log (cid:18) M + ρ k ( G ) v k ( G ) (cid:19) − log (cid:18) M + ρ k ( G ) w k ( G ) (cid:19)(cid:19)(cid:21) + (75) = (cid:20)(cid:18) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:19)(cid:21) + (76)with probability , where (75) follows from the fact that f ( x ) = [log( a + x ) − log( b + x )] + is a non-increasing func-tion if x ≥ , where a and b are positive real num-bers. The equality in (76) follows from the fact g ( x ) = (cid:2) log (cid:0) M + ax (cid:1) − log (cid:0) M + bx (cid:1)(cid:3) + is non-decreasing if x ≥ where a and b are non-negative real numbers and M ≥ .We now bound R k as follows: R k ≤ T d T E (cid:20)(cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + (cid:35) + γ B (77) = T d T E (cid:20)(cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + (cid:35) (78)where (77) follows from (71) and from the fact that (73) = (76) with probability 1 and (78) follows from the fact that lim B →∞ γ B = 0 .We now bound the secure degree of freedom of k -th useras follows d k = lim M →∞ R k log M ≤ lim M →∞ T d T E (cid:20)(cid:20) log (1 + M ρ k v k ( G ))log M − log (1 + M ρ k w k ( G ))log M (cid:21) + (cid:35) (79) = T d T E (cid:20) lim M →∞ (cid:20) log (1 + M ρ k v k ( G ))log M − log (1 + M ρ k w k ( G ))log M (cid:21) + (cid:35) , (80)where (79) follows from (78) and (80) follows form the domi-nant convergence theorem. To apply the dominant convergencetheorem, we need to show that random variable t ( M ) (cid:44) (cid:20) log (1 + M ρ k v k ( G ))log M − log (1 + M ρ k w k ( G ))log M (cid:21) + (81)is upper and lower bounded by random variables that have aﬁnite limit for M > . Note that t ( M ) is lower bounded byzero and upper bounded by t + ( M ) (cid:44) log (1 + M ρ k v k ( G ))log M for any M > since the second log( · ) term in (81) is non-negative. We next upper bound E [ t + ( M )] as follows: E (cid:2) t + ( M ) (cid:3) = E (cid:34) log (cid:0) M + ρ k v k ( G ) (cid:1) log M (cid:35) + 1 (82) ≤ E (cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19)(cid:21) + 1 ≤ log (1 + ρ k E [ v k ( G )]) + 1 (83) ≤ log (1 + ρ k ( γ k + π k )) + 1 (84) < ∞ , (85)where γ k (cid:44) (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) , π k (cid:44) E (cid:20)(cid:12)(cid:12)(cid:12) H k m ˆ H ∗ k m (cid:12)(cid:12)(cid:12) (cid:21) . Inthe derivation above, (83) follows from Jensen’s inequalityand (84) follows from the fact that E (cid:20)(cid:12)(cid:12)(cid:12) H k ˆ H ∗ k (cid:12)(cid:12)(cid:12) (cid:21) = (cid:0) M − M (cid:1) (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) + M E (cid:20)(cid:12)(cid:12)(cid:12) H k m ˆ H ∗ k m (cid:12)(cid:12)(cid:12) (cid:21) = (cid:0) M − M (cid:1) γ k + M π k .We continue the derivation of the upper bound on d k withthe following:(80) = T d T E (cid:34)(cid:34) lim M →∞ log (cid:0) M + ρ k v k ( G ) (cid:1) log M − lim M →∞ log (cid:0) M + ρ k w k ( G ) (cid:1) log M (cid:35) +  (86) = 0 , (87)where (86) follows from the fact [ · ] is a continuous function.n order to show the equality in (87), ﬁrst note that lim M →∞ v k ( G ) = lim M →∞ α k M (cid:12)(cid:12)(cid:12) H k ˆ H ∗ k (cid:12)(cid:12)(cid:12) = lim M →∞ α k M M (cid:88) m =1 H k m ˆ H ∗ k m × lim M →∞ M M (cid:88) m =1 H ∗ k m ˆ H k m = 1 α k (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) , (88)with probability , where (88) follows from the strong law oflarge numbers. In a similar way we can show that lim M →∞ w k ( G ) = 1 α k E (cid:20)(cid:12)(cid:12)(cid:12) H k m ˜ H ∗ k m (cid:12)(cid:12)(cid:12) (cid:21) = 1 α k (cid:12)(cid:12)(cid:12) E (cid:104) H e m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) (89)with probability , where (89) follows from the fact that thejoint probability distribution of (cid:16) H e , ˆ H k (cid:17) is identical with thatof (cid:16) H k , ˜ H k (cid:17) . Hence, we have lim M →∞ log (cid:18) M + ρ k v k ( G ) (cid:19) = log (cid:16) lim M →∞ ρ k v k ( G ) (cid:17) = log (cid:18) ρ k α k (cid:12)(cid:12)(cid:12) E (cid:104) H k m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) (cid:19) (90)with probability . Further, we have lim M →∞ log (cid:18) M + ρ k w k ( G ) (cid:19) = log (cid:16) lim M →∞ ρ k w k ( G ) (cid:17) = log (cid:18) ρ k α k (cid:12)(cid:12)(cid:12) E (cid:104) H e m ˆ H ∗ k m (cid:105)(cid:12)(cid:12)(cid:12) (cid:19) (91)with probability 1. The equality in (87) follows by combin-ing (90) and (91). Hence, the proof ends.The proof of Theorem 3 for the case in which the BSemploys δ -conjugate beamforming can be done in the similarway. One only needs to replace c k Σ c ∗ k and c e Σ c ∗ e in (70) with M δ c k Σ c ∗ k and M δ c e Σ c ∗ e , respectively and change the rest ofthe proof accordingly. (cid:50) B. Proof of Corollary 2

Assume that the BS employs conjugate beamforming with-out loss of generality. Note that from (78), we have followingupper bound: lim M →∞ R k = lim M →∞ T d T E (cid:20)(cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + (cid:35) = T d T E (cid:20) lim M →∞ (cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + (cid:35) (92)where (92) follows from the dominant convergence theorem.To apply the dominant convergence theorem, we need to showthat random variable g ( M ) (cid:44) (cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + is upper and lower bounded by random variables that have aﬁnite limit for M > . Note that g ( M ) is lower bounded byzero and upper bounded by g ( M ) ≤ (cid:20) log (cid:18) M + ρ k v k ( G ) (cid:19) − log (cid:18) M + ρ k w k ( G ) (cid:19)(cid:21) + ≤ log (2 + ρ k v k ( G ) + ρ k w k ( G )) − log ( ρ k w k ( G )) (93)with probability for any M > . Noting the analysis in (82)-(85), in order to show (93) is upper bounded by a randomvariable that has a ﬁnite expectation, it is sufﬁcient to showthe expectation of second log( · ) term in (93) has a ﬁnite lowerbound. Hence, E [log ( ρ k w k ( G ))] = log ρ k + E [log ( w k ( G ))]= log ρ k + E [log ( w k ( G ))]= log ρ k − log α k + E [log ( K M )] (94) ≥ log ρ k − log α k + (cid:90) log( x ) p K M ( x ) d x ≥ log ρ k − log α k + (cid:90) log( x ) p K M ( x ) d x ≥ log ρ k − log α k + (cid:90) log( x ) r d x (95) = log ρ k − log α k − r log e> −∞ , where r is deﬁned in the statement of Corollary 2. In (94),the equality follows from the deﬁnition of K M in Corollary 2and from the fact that the joint probability distribution of (cid:16) H e , ˆ H k (cid:17) is identical with that of (cid:16) H k , ˜ H k (cid:17) . In (95), theinequality follows from the assumption in Corollary 2. Therest of the proof follows from Appendix C-AThe proof for the case the BS employs δ -conjugate beam-forming follows from the same argument at the end of Ap-pendix C-A (cid:50) A PPENDIX DP ROOF OF T HEOREM T r of training phase has to be identical to atleast the size of the pilot signal set L so that the BS cangenerate L ≥ K mutually orthogonal pilot signals. Let J e any integer in set { , . . . , T r } . In order to estimate k -thuser’s channel, the BS ﬁrst projects the received signal duringthe training phase Y T r indicated in (21) to φ k . Then, the BSnormalizes the projected signal and estimates the gain of thechannel connecting the BS to k -th user at i -th block as ˆ H k ( i ) = x (cid:16)(cid:112) T r ρ r H k ( i )+Π i M e (cid:88) n =1 (cid:114) T r ρ jam M e J H e ( i ) + V k (cid:33) (96)for any k ∈ { , . . . , K } , where E (cid:104) || ˆ H k ( i ) || (cid:105) = 1 , V k isdistributed as CN (0 , I M ) for any k ∈ { , . . . , K } , x (cid:44) √ M (cid:113) T r ρ r +1+ T r ρjamJ and { Π i } i ≥ is an i.i.d Bernoulli pro-cess, where P (Π i = 1) = JL . Event { Π i = 1 } indicates thatthe set of pilot signals the adversary contaminates at i -th blockcontains k -th user’s pilot signal.Utilizing stochastic encoding and conjugate beamformingas in the proof Theorem 1, we can show that rate R k = (cid:20) T d T log (cid:18) M ρ k aρ f + ρ jam + 1 (cid:19) − T d T log (cid:18) M e ρ k + M M e ρ k ρ jam aLρ r (cid:19)(cid:21) + (97)for any k ∈ { , . . . , K } is achievable, where a (cid:44) T r ρ r T r ρ r +1+ T r ρjamL . Notice that the rate in (97) does not dependon J . We can rewrite R k as R k = (cid:20) T d T log (cid:18) M ρ k ρ r T r ( ρ f + ρ jam + 1)( ρ r T r + ρ jam + 1) (cid:19) − T d T log (cid:18) M e ρ k + M e M ρ k ρ jam ρ r T r + ρ jam + 1 (cid:19)(cid:21) + (98)due to the fact that L = T r . Suppose γ ≤ . We bound R k asfollows R k ≥ T d T (cid:20) log (cid:18) M ρ k ρ r M γ ( ρ f + ρ jam + 1)( ρ r M γ + ρ jam + 1) (cid:19) − log (cid:18) M e ρ k + M e M ρ k ρ jam ρ r M γ + ρ jam + 1 (cid:19)(cid:21) + (99) ≥ T d T (cid:20) log M + log (cid:18) ρ k ρ r ( ρ f + ρ jam + 1)( ρ r + ρ jam + 1) (cid:19) − (1 − γ ) log M − log (cid:18) M e ρ k + M e ρ k ρ jam ρ r (cid:19)(cid:21) + = T d T (cid:20) γ log M − log (cid:18)(cid:18) M e ρ k + M e ρ k ρ jam ρ r (cid:19) × ( ρ f + ρ jam + 1) ρ r + ρ jam + 1 ρ k ρ r (cid:19)(cid:21) + (100)where (99) follows from the fact that T r ≥ M γ . Notice thatthe second logarithm term in (100) does not depend on M .Hence, we observe that R k log M ≥ T d T γ − (cid:15) if M ≥ G ( (cid:15) ) . In asimilar way, for γ > , we can show that R k log M ≥ T d T − (cid:15) if M ≥ G ( (cid:15) ) . A PPENDIX E A. Proof of Theorem 5

We set the size of the pilot signal set L to T r . Let J beany integer in set { , . . . , T r } . The BS uses the same strategyexplained in the proof of Theorem 4 in order to estimate thegains of channels connecting the BS to users.The BS picks arbitrary message rates R k > , k = 1 , . . . , K . The equivocation rate for a code (cid:0) BT R , . . . , BT R K , BT d (cid:1) utilizing deterministic encodingmapping functions, f k , k = 1 , . . . , K and δ -conjugate beam-forming is as follows: BT H (cid:16) W k | Z BT d , H B , ˆ H B , H Be (cid:17) ≥ R k − T d T log (cid:18) M e ρ k M δ + M − δ M e ρ k ρ jam aLρ r (cid:19) (101) = R k − T d T log (cid:18) M e ρ k M δ + M − δ M e ρ k ρ jam ρ r T r + T r + 1 (cid:19) (102) ≥ R k − T d T log (cid:18) M e ρ k M δ + M − δ − γ M e ρ k ρ jam ρ r (cid:19) (103)for all k ∈ { , . . . , K } , where a is deﬁned in (97) and ˆ H k ( i ) for any k ∈ { , . . . , K } and i ∈ { , . . . , B } is given in (96).In the above derivation, (101) follows from a derivation thatis similar to (49)-(54) in Appendix B-A, (102) follows fromthe fact the cardinality of pilot signal set L is chosen as T r and (103) follows from the fact that T r ≥ M γ .As δ + γ > , the RHS of (103) goes to zero as M → ∞ .For any (cid:15) > , M ≥ S ( (cid:15) ) implies that right hand side of(103) is smaller than (cid:15) , completing the proof. B. Proof of Corollary 3

We set the size of the pilot signal set L to T r . Let J beany integer in set { , . . . , T r } . The BS uses the same strategyexplained in the proof of Theorem 4 in order to estimate thegains of channels connecting the BS to users.Pick δ and γ such that < δ < and γ + δ > .Pick any arbitrary (cid:15) > and arbitrary rate tuple R =[ R , . . . , R K ] . Choose M such that M ≥ max( V ( R ) , S ( (cid:15) )) .Note that inequality M ≥ V ( R ) implies that R k ≤ T d T log (cid:16) M − δ ρ k aM − δ ρ f + ρ jam +1 (cid:17) for all k ∈ { , . . . , K } , where a is deﬁned in (97). As in the proof of Corollary 1, wecan show that there exists B ( (cid:15) ) > and a sequence ofcodes (cid:0) BT R , . . . , BT R K , BT d (cid:1) that satisfy the decodabilityconstraint in (10) for B ≥ B ( (cid:15) ) , when δ -beamforming,combined with deterministic mapping is used. In addition,since M ≥ S ( (cid:15) ) and T r ≥ M γ , following Theorem 5,the sequence of codes mentioned above satisfy the constraintin (11), completing the proof.R EFERENCES[1] H. Yang, and T. L. Marzetta, “Performance of conjugate and zero-forcingbeamforming in large-scale antenna systems,”

IEEE Journal on SelectedAreas in Communications , vol. 31, no:2, pp:172-179, Feb. 20132] Y. Ozan Basciftci and C. Emre Koksal, “How different is securitywith massive MIMO?”, In

UCSD Information Theory Applications(ITA) , 2015, http://ita.ucsd.edu/workshop/15/ﬁles/abstract/abstract 1300.txt, Feb. 2015.[3] T. L. Marzetta, “Multi-cellular wireless with base stations employingunlimited numbers of antennas,” in

Proc. UCSD Inf. Theory Applications

Workshop, Feb. 2010.[4] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbersof base station antennas,”

IEEE Trans. Wireless Commun. , vol. 9, no. 11,pp. 3590–3600, Nov. 2010.[5] L. Lu, G. Li, A. Swindlehurst, A. Ashikhmin, and R. Zhang , “Anoverview of massive MIMO: beneﬁts and challenges,”

IEEE Journal ofSelected Topics in Signal Processing , vol. 8, no. 5, pp. 742-758, Oct.2014.[6] J. Jose, A. Ashikhmin, T.L. Marzetta, and S. Vishwanath, “Pilot contam-ination and precoding in multi-cell TDD systems,”

IEEE Trans. WirelessCommun. , vol. 10. no. 8, pp. 2640–2651, Aug. 2011.[7] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectralefﬁciency of very large multiuser MIMO systems,”

IEEE Trans. Commun ,vol. 61, no. 4, pp. 1436–1449, April 2013.[8] R. R. Muller, L. Cottatellucci, and M. Vehkapera, “Blind pilot decontam-ination,”

IEEE Journal of Selected Topics in Signal Processing , vol. 8,no. 5, pp. 773–786, October 2014.[9] J. Zhu, R. Schober, and V. Bhargava, “Secure transmission in multicellmassive MIMO systems,

IEEE Trans. Wireless Commun. , vol. 13, no. 9,Sept. 2014, pp. 4766–4781.[10] A. D.Wyner, “The wire-tap channel”.

Bell Syst. Tech. J. , 54(8):1355–1387.[11] T. L. Marzetta, “How much training is required for multiuser MIMO?”,

Fortieth Asilomar Conf. on Signals, Systems, & Computers , Paciﬁc Grove,CA, Oct. 2006.[12] T. M. Cover and J. A. Thomas, Elements of information theory. NewYork: Wiley, 1991[13] B. Hassibi and B. Hochwald, “How much training is needed in multiple-antenna wireless links?,

IEEE Trans. Inf. Theory , vol. 49, pp. 951–963,Apr. 2003[14] Y. O. Basciftci, O. Gungor, C. E. Koksal, and F. Ozguner “On the secrecycapacity of block fading channels with a hybrid adversary?,

IEEE Trans.Inf. Theory , vol. 61, no. 3, pp. 1325–1343, March 2015.[15] W. Difﬁe and M. Hellman, “New directions in cryptography,”

IEEETrans. Inf. Theory , vol. 22, no. 6, pp. 644–654, Nov. 1976.[16] R. L. Rivest, A.Shamir, and L.Adleman, “A method for obtainingsignatures and public-key cryptosystems”, Commun. ACM, vol. 21, no.2, pp.120–126, Feb.1978.[17] R. Wilson, D. Tse, and R. A. Scholtz, “Channel identiﬁcation: Secretsharing using reciprocity in ultrawideband channels,”

IEEE Trans. Inform.Forensics and Security , vol. 2, pp. 364–375, Sept. 2007.[18] C. Ye, S. Mathur, A. Reznik, W. Trappe, and N. Mandayam,“Information-theoretic key generation from wireless channels,”

IEEETrans. Inform. Forensics and Security , vol. 5, pp. 240254, Jun. 2010.[19] L. Lai, Y. Liang, and H. V. Poor, “A uniﬁed framework for keyagreement over wireless fading channels,”

IEEE Trans. Inform. Forensicsand Security , vol. 7, pp. 480–490, Apr. 2012.[20] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge,U.K.: Cambridge University Press, 2012[21] D. N. C. Tse and P. Viswanath, Fundamentals of Wireless Communica-tions. Cambridge, UK: Cambridge University Press, 2005[22] A. Khisti and G. W. Wornell, “Secure transmission with multipleantennas II: The MIMOME wiretap channel,

IEEE Trans. Inform. Theory ,vol. 56, no. 11, pp. 5515-5532, Nov. 2010.[23] F. Oggier and B. Hassibi, “The secrecy capacity of the MIMO wiretapchannel,