[PDF] Dual MINE-based Neural Secure Communications under Gaussian Wiretap Channel

Abstract

Recently, some researches are devoted to the topic of end-to-end learning a physical layer secure communication system based on autoencoder under Gaussian wiretap channel. However, in those works, the reliability and security of the encoder model were learned through necessary decoding outputs of not only legitimate receiver but also the eavesdropper. In fact, the assumption of known eavesdropper's decoder or its output is not practical. To address this issue, in this paper we propose a dual mutual information neural estimation (MINE) based neural secure communications model. The security constraints of this method is constructed only with the input and output signal samples of the legal and eavesdropper channels and benefit that training the encoder is completely independent of the decoder. Moreover, since the design of secure coding does not rely on the eavesdropper's decoding results, the security performance would not be affected by the eavesdropper's decoding means. Numerical results show that the performance of our model is guaranteed whether the eavesdropper learns the decoder himself or uses the legal decoder.

Full PDF

XXXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

Dual MINE-based Neural Secure Communications under Gaussian Wiretap Channel

Jingjing Li*, Zhuo Sun*, Lei Zhang † , Hongyu Zhu ‡ * Key Laboratory of Universal Wireless Communications, Beijing University of Posts and Telecommunications Communications † State Grid Fujian Electric Power Co.LTD, Fuzhou, China ‡ State Grid Hunan Electric Power Co.LTD, Changsha, China Email:{jingjing, zhuosun}@bupt.edu.cn, {jingjing, zhuosun}, [email protected], [email protected]

Abstract —Recently, some researches are devoted to the topic of end-to-end learning a physical layer secure communication system based on autoencoder under Gaussian wiretap channel. However, in those works, the reliability and security of the encoder model were learned through necessary decoding outputs of not only legitimate receiver but also the eavesdropper. In fact, the assumption of known eavesdropper's decoder or its output is not practical. To address this issue, in this paper we propose a dual mutual information neural estimation (MINE) based neural secure communications model. The security constraints of this method is constructed only with the input and output signal samples of the legal and eavesdropper channels and benefit that training the encoder is completely independent of the decoder. Moreover, since the design of secure coding does not rely on the eavesdropper's decoding results, the security performance would not be affected by the eavesdropper's decoding means. Numerical results show that the performance of our model is guaranteed whether the eavesdropper learns the decoder himself or uses the legal decoder.

Keywords—wiretap channel, autoencoder, security constraint, mutual information neural estimation I. I NTRODUCTION

With the development of 5G technology, the application of wireless communication technology goes deep into all fields of society. A variety of confidential and sensitive data in wireless networks are growing rapidly, which leads to prominent information security issues. Compared with traditional wired networks, wireless communication is vulnerable to eavesdropping, attacks, and interference due to the openness of the transmission medium and the characteristics of broadcasting. Most of the traditional security mechanisms are based on cryptography. On the one hand, with the increasing performance of processors, the rapid improvement of computer computing capacity, and the maturity of distributed computing theory, the classical encryption algorithm system based on complexity is becoming increasingly unstable. On the other hand, the appearance of quantum computer makes it more vulnerable and easier to be cracked. Therefore, physical layer security technology is triggering wide interest in the academic field, which process transmitted signals or implement corresponding security policies using channel information with the characteristics of lightweight, high security, and can communicate safely without using the pre-shared key. A considerable research on physical layer security is based on information theory. Researchers make a profound study on various wireless wiretap channel models, and seek ways to make the security transmission rate as close as possible to security capacity. For example, some typical methods include directional modulation [1]-[3], polarization modulation [4] and cooperative interference [5]-[7]. Through beamforming, artificial noise, depolarization effect or the introduction of cooperative jammer to send artificial interference, the eavesdropper cannot demodulate useful information due to the serious deterioration of the received signal, thus improving the security of the system in the physical layer. In this paper, we also start from the perspective of information theory, maximizing the secure transmission rate of the system based on the wiretap channel model. Deep neural networks (DNNs) have been successfully applied in many fields, such as computer vision [1], natural language processing [2], and speech recognition [3]. When it comes to using DNNs to learn the physical layer security communication method, the autoencoder is the first attempt. In [4], the author first proposed to apply deep learning to the communication system, and an autoencoder structure was used to construct a communication system model. The model greatly improved the reliability of communication through end-to-end training. An application example of autoencoder in the wiretap channel can be found in [5] which considers the encoding and decoding scheme of the Gaussian wiretap channel with a single antenna. The main idea is to construct an end-to-end communication system model, and optimize the encoder and decoder jointly to achieve a balance between reliability and security. In our previous work [6], we also used end-to-end learning to solve two communication security problems: confidential transmission and user authentication. However, learning the end-to-end communication system is a challenging task. It is necessary to establish a differentiable channel model to meet the reverse transfer of parameters in the process of training the neural networks, which is impossible in real-world channel conditions. One way to dissolve the above issue is to utilize mutual information estimation [7]. Using this method, it was shown in [8], that a secure encoding can be learned based on a mixed loss function that consists of the approximate mutual information of the legitimate channel, learned from signal samples, and the cross-entropy function from the clustering enforcing the fake distribution. As a result, the intermediate steps of estimating and generating channel distribution can be skipped. However, the decoding results of the eavesdropper in their mixing function are difficult to obtain in the actual environment, especially in passive eavesdropping scenario. Besides, they only use one eavesdropping structure to verify the security of the system, which led to an ideal result. In this paper, we present a dual MINE-based neural secure communications model, two MINE modules are used to estimate the channel capacity of the legitimate channel and eavesdropper channel respectively, the estimated mutual information are subsequently used to construct the security capacity constraint as the objective function for training the encoder. Our main contributions and novelties are summarized as follows: (1) the encoder and decoder are completely independently training, avoiding the problem of differentiable channel model needed. (2) We can train the encoder reliably just with the signal samples of the main hannel and eavesdropper channel, enabling deep learning for encoder without explicit knowledge of decoders. (3) We provide eavesdropper the two most possible decoder options to verify the system performance. One is that the eavesdropper knows the encoder used by the transmitter, and learns a corresponding decoder model by himself; Another is that the eavesdropper can obtain the decoder of the legitimate receiver and uses it to decode its received signals. The rest of this paper is organized as follows. We present the basis of the Gaussian wiretap channel used in our work in Section II. The implementation details of the proposed dual MINE-based neural secure communications model and the training methods of each block are detailed in Section III. We provide the numerical results and correlation analysis in Section IV. Finally, the paper is concluded in Section V.

Notations—

Random variables of all dimensions are denoted in boldface upper case letters, e.g., 𝑀 . The sets of codebooks are denoted with calligraphic letters, i.e. ℳ . The dimension n is marked in the upper right corner of the letter. e.g., 𝑋 𝑛 . The element 𝑖 is marked in the lower right corner of the letter (e.g., 𝑋 𝑖 ). The probability mass or density function of event 𝑥 is denoted as 𝑝(𝑥) . ℝ(ℂ) is the set of real numbers (complex numbers). The expectation is denoted by

𝐸{∙} . II. B ASIS OF G AUSSIAN W IRETAP C HANNEL

The simplest scenario which involves both tasks of reliable transmission and secrecy is the wiretap channel [9], which is a three-node network consists of a transmitter, a legitimate receiver, and an eavesdropper. The goal of the transmitter is to communicate reliably with the legitimate receiver while keeping the eavesdropper as ignorant of the confidential information as possible. In this paper, we consider the degraded Gaussian wiretap channel extended from Wyner’s wiretap channel [10] as depicted in Fig. 1. It is a degraded broadcast channel with memoryless additive Gaussian white noise [11]. It is believed that this will lay a foundation for us to understand and study more complex communication scenarios. The procedures to transmit confidential information mainly consist of three phases, which are summarized as follows: First, Alice transmits a message 𝑀 from the set of ℳ = {1,2, … ,2 𝑛𝑅 } at a rate 𝑅 to the legitimate user Bob. Alice encodes it into a codeword of block length 𝑛 using an encoding function 𝑓(𝑀) = 𝑥 𝑛 (𝑀) ∈ ℂ 𝑛 . Besides, we assume that the variance of the transmitted codewords is not allowed to exceed a certain value 𝑃 𝐸{𝑥(𝑀) } ≤ 𝑃, ∀ ∈ ℳ () Second, the main channel between Alice and Bob and the eavesdropper channel between Alice and Eve are defined by 𝑌 𝑖 = 𝑋 𝑖 + 𝑁 𝐵,𝑖 𝑓𝑜𝑟 𝑖 ∈ {1, … ,𝑛}, 𝑁

𝐵,𝑖 ∼ 𝐶𝒩(0, 𝜎 𝐵2 ) () 𝑍 𝑖 = 𝑌 𝑖 + 𝑁 𝐸,𝑖 𝑓𝑜𝑟 𝑖 ∈ {1, … ,𝑛}, 𝑁

𝐸,𝑖 ∼ 𝐶𝒩(0, 𝜎 𝐸2 ) () where 𝑋 𝑖 , 𝑌 𝑖 , 𝑍 𝑖 denote the channel input, Bob’s observation, and Eve’s observation, respectively. In the following, we consider the degraded case such that 𝜎 𝐵2 < 𝜎 𝐸2 . The third part is based on the premise that the eavesdropper knows the encoder of the transmitter, and then builds a decoder neural network and trains it by himself. The task of Bob and Eve is to estimate and recover the original message using the decoding functions 𝑔(𝑦 𝑛 ) = 𝑀̂ () ℎ(𝑧 𝑛 ) = 𝑀̃ () The symbol error rate (SER) of the two receivers is then defined as 𝑃 𝐵 = 𝐸{𝑀̂ ≠ 𝑀|𝑌} , 𝑃 𝐸 = 𝐸{𝑀̃ ≠ 𝑀|𝑍} , respectively. III. D UAL

MINE-

BASED N EURAL S ECURE C OMMUNICATIONS M ODEL

Aiming at provide a secure communication over the degraded Gaussian wiretap channel, we build a fully learning based communication system shown in Fig. 2, of which all the modules are implemented with DNNs, including an encoder block, two mutual information estimators, and two decoder blocks. Note that the encoder block and the decoder block are independent and not constructed as autoencoder model. As a result, the intermediate process of passing the gradient back from the decoder through the channel can be skipped, thus avoiding the need of differentiable channel model. Wiretap channel security capacity is the basis of physical layer security technology research, which plays an important role in guiding the physical layer security mechanism. Therefore, we consider to use the security capacity as a metric to learn the optimal encoding function of the Gaussian wiretap channel.

Fig. 2: The architecture of Dual MINE-based Neural Secure Communications model.

Fig. 1: Degraded Gaussian wiretap channel. The sender (Alice) sends confidential information 𝑀 to the legitimate receiver (Bob) while keeping the external eavesdropper (Eve) unaware of it. he secrecy capacity of a Gaussian wiretap channel was derived in [12] and was found to be the capacity difference between the main and the eavesdropper channels, i.e., 𝐶 𝑠 = max 𝑝(𝑥) {𝐶 𝑚 − 𝐶 𝑤 } () where 𝐶 𝑚 is the main channel capacity and 𝐶 𝑤 is the eavesdropper channel capacity. And the capacity 𝐶 for a Gaussian channel is known to be 𝐶 = max 𝑝(𝑥)

𝐼(𝑋; 𝑌) . Therefore, the secrecy capacity calculation further simplifies to just the difference in the mutual information of two channels. 𝐶 𝑠 = max 𝑝(𝑥) [𝐼(𝑋; 𝑌) − 𝐼(𝑋; 𝑍)] () The two MINE modules of our proposed model are used to achieve the estimation of the mutual information. Based on this, the encoder loss function can satisfy the reliability and security constraints. A. MINE Block

As described above, we use the mutual information between input-output measurements of the wireless channel to optimize the encoder weight. In this work, we choose to utilize MINE to achieve this goal, which using SGD and Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence defined as

𝐼(𝐴; 𝐵) = 𝐷 𝐾𝐿 (𝑝(𝑎, 𝑏)‖𝑝(𝑎)𝑝(𝑏)) ≥ sup 𝜃∈Θ 𝐸 𝑝(𝑎,𝑏) {𝑇 𝜃 (𝐴, 𝐵)} − log 𝐸 𝑝(𝑎)𝑝(𝑏) {𝑒 𝑇 𝜃 (𝐴,𝐵) } () where 𝐴 and 𝐵 denote two random variables with arbitrary distributions 𝑝(𝑎) , 𝑝(𝑏) , respectively. In the MINE, MI0 and MI1 represent two equally constructed neural networks, each of which has a fully-connected hidden layer with a relu activation function, and a linear output layer. The specific operation of calculating mutual information is explained by taking MI0 as an example. We first sample the joint distribution 𝑝(𝑥 𝑛 , 𝑦 𝑛 ) and the marginal distributions 𝑝(𝑥 𝑛 ) and 𝑝(𝑦 𝑛 ) , and then send the results to MI0 to get the 𝑇 𝜃 . Finally, the mean value of 𝑇 𝜃 is calculated to approximate the expectations. Given that 𝑇 𝜃 is expressive enough, the above lower bound converges to the true mutual information. Hence the mutual information for 𝑋 𝑛 and 𝑌 𝑛 is calculated as 𝐼̃ 𝜃 (𝑋 𝑛 ; 𝑌 𝑛 ) ∶= ∑ [𝑇 𝜃 (𝑥 (𝑖)𝑛 , 𝑦 (𝑖)𝑛 )] 𝑘𝑖=1 − log ∑ [𝑒 𝑇 𝜃 (𝑥 (𝑖)𝑛 ,𝑦̅ (𝑖)𝑛 ) ] 𝑘𝑖=1 () where the 𝑘 = 64 represents the number of samples. The objective functions for optimization MI0 and MI1 are then set as max 𝜃 𝐼̃ 𝜃 (𝑋 𝑛 ; 𝑌 𝑛 ) and max 𝜗 𝐼̃ 𝜗 (𝑋 𝑛 ; 𝑍 𝑛 ) , respectively. B. Encoder Block

The encoder consists of an embedding followed by a dense hidden layer with an elu activation function, and a linear output layer. Its 2n-dimensional real-valued output is converted to an n-dimensional complex-valued vector and the power constraint is applied to the result to form 𝑋 𝑛 . The purpose of training encoder is to minimize SER and information leakage. This is a multi-objective programming problem (MOP) that can be solved by combining multiple objectives into a scalar objective. Here we use the weighted sum and scalar objective for the objective function, expressed as max 𝜑 α𝐼(𝑋 𝜑𝑛 (𝑀); 𝑌 𝑛 ) − (1 − 𝛼)𝐼(𝑋 𝜑𝑛 (𝑀); 𝑍 𝑛 ) () where the value of 𝐼(𝑋 𝜑𝑛 (𝑀); 𝑌 𝑛 ) will affect the reliability of the system, while the value of 𝐼(𝑋 𝜑𝑛 (𝑀); 𝑍 𝑛 ) will affect the security of the system. The α in (10) is a specific weight that can be used to balance two objectives. C. Decoder Block

The structure of the decoder is almost symmetrical to the encoder, except that a softmax activation function is used in the final dense layer. Its output is a probability vector that assigns a probability to each of the possible messages. Then we use an arg max layer to select the index of the maximum value in the probability vector, which represents the information value 𝑀̂ of the decoding result. The target of training Bob and Eve is to decode secure coded signals, that is, to restore the transmitted message as much as possible. In this case, the cross-entropy loss function with variants of stochastic gradient descent [13] is a satisfactory choice. The loss function of the two decoders can be written as 𝐿 𝐵 ∶= 𝐻(𝑀, 𝑀̂) = − ∑ log 𝑝 𝑀̂𝑘𝑖=1 () 𝐿 𝐸 ∶= 𝐻(𝑀, 𝑀̃ ) = − ∑ log 𝑝 𝑀̃𝑘𝑖=1 () where 𝑘 = 500 represents the number of samples, 𝑝 𝑀̂ and 𝑝 𝑀̃ denote the estimated probability of the message received by the two receivers, respectively. They describe the distance between the actual output and the expected output, that is, the smaller the cross-entropy, the smaller the SER. Our training procedure is divided into three phases.

Firstly, we train the MI estimation network 𝑇 𝜃 for 5 epochs with 500 iterations. At this point, the parameters of the encoder network are only randomly initialized and the estimated mutual information might not reflect the final estimated value. Secondly, the security of the system is learned by training the encoder and two estimators iteratively. The parameters of one model will be fixed while training the other so that the optimal point can be found. Finally, we train the legal decoder to achieve reliable communication with the transmitter and train the eavesdropper’s decoder to verify the security of the system. Note that, all neural networks in the system are optimized by Nadam optimizer [14] with a learning rate of 0.001. The signal-to-noise ratio (SNR) of both channels is fixed to 7dB. And the output of the encoder network has a unit average power normalization. IV. N UMERICAL R ESULTS AND A NALYSIS A. Reliability and Security

In this part, we test the system under different

𝑆𝑁𝑅𝑠 ∈(0𝑑𝐵, 21𝑑𝐵) of the legal channel. Firstly, we set the SNR of the eavesdropper channel to 7dB. The SER curves can be seen in Fig. 3. The 16QAM is taken as the benchmark for comparison. One can see that the reliability performance of our method is close to that of 16QAM at low SNR, and the reliability performance gap is within 1dB when SNR gets igher. This is due to the trade-off between reliability and security we considered, the system sacrifices part of reliability for security. Moreover, the SER of Eve is limited to a large value even at high SNR. Subsequently, we conducted a test to see if Eve could achieve the purpose of eavesdropping using Bob’s decoding network. The result is shown in Fig. 4. As expected, the decoding result of Eve has a high symbol error rate, which proves that the model achieves presentable security regardless of the eavesdropping means. From Fig. 5, we can find that the coefficient in (10) is the most important factor to determine the system performance. The SER of Bob and Eve get smaller with increasing weight α , which suggests that the NN can find better constellations. The drawback is that the security is reduced, due to the trade-off between security and reliability. Therefore, we have taken a conservative approach, which resulted in = . B. Comparison to Existing Methods

To compare our approach with others presented in [5] and [8], we reproduced a system model similar to the one they used. The most prominent feature of this model is the use of coset coding algorithm . The actual message is hidden in the cosets. Bob can distinguish the real codeword from the cosets, while Eve can only recognize the cosets of the codeword. The general process is that the input signal 𝑀 is coded into 𝑆 by one-hot encoding, and then the 𝑆 is encoded by coset to get equalized input symbol distribution 𝑆̅ . The decoding distribution of Bob and Eve is expressed as 𝑆̂ and 𝑆̃ , respectively. For comparison, we use the following three methods to implement the security of communication system: Autoencoder and Cross-Entropy (AE+CE):

This is the method used in [5]. They use an end-to-end approach to jointly train the encoder and decoder with a secure loss function based on cross-entropy which is defined as 𝐿 𝑠𝑒𝑐 ∶= 𝛼𝐻(𝑀; 𝑀̂) +(1 − 𝛼)𝐻(𝑀̅ ; 𝑀̃ ) = −𝛼 ∑ 𝑆 𝑖 log 𝑆 𝑖 ̂ − (1 − 𝛼) ∑ 𝑆 𝑖 log 𝑆 𝑖 ̃ |ℳ|𝑖=1|ℳ|𝑖=1 () Mutual Information and Cross-Entropy (MI+CE):

This is the method raised in [8]. The training of encoder and decoder is carried out separately, and the security loss function used in training encoder is the combination of mutual information and cross-entropy, i.i.d. 𝐿 𝑠𝑒𝑐 ∶= 𝛼𝐼(𝑋 𝑛 (𝑀); 𝑌 𝑛 ) −(1 − 𝛼)𝐻(𝑀; 𝑀̃ ) () Fig. 3: SER performance of the proposed method for a 16-dimensional codeword constellation. The weight  in (10) is 0.7. Eve uses its own trained neural network as the decoder. A theoretical estimation of 16QAM as a reference. Fig. 4: SER performance of the proposed method for a 16-dimensional codeword constellation. The weight  in (10) is 0.7. Eve uses Bob’s neural network as the decoder. A theoretical estimation of 16QAM as a reference. (a) =  (b) = (c) = (d) = Fig. 5: The resulting encoding constellations are shown for 16 symbols based on: (a) the encoder training weight = ; (b) the encoder training weight = . Their corresponding SER performance is shown in (c) and (d), respectively. here 𝐻(𝑀, 𝑀̃ ) is the same as the one in (13). Mutual Information and Mutual Information (MI+MI):

This is the Dual MINE-based Neural Secure Communications model proposed in this paper which also trains the encoder and decoder separately. Moreover, compared with method 2, our security constraints can be implemented without knowing the output of the decoder. Therefore, the safety loss is as follows: 𝐿 𝑠𝑒𝑐 ∶= 𝛼𝐼(𝑋 𝑛 (𝑀); 𝑌 𝑛 ) −(1 − 𝛼)𝐼(𝑋 𝑛 (𝑀); 𝑍 𝑛 ) () Fig. 6 shows the performance of the three methods in security encoding. We can analyze the reliability and security of each method through the green and red curves. One can see that, Fig. 6c has a better reliability in the case of similar security as Fig. 6b. This shows that our method can achieve good performance even without prior knowledge of the decoder. By comparing Fig. 6a and Fig. 6b, it can be found that the former has better security in the case of similar reliability. This is because the latter two methods train the encoder and decoder separately which cannot ascertain the global optimality of the system, even if they all achieve their respective optimal performance. But as described in the first section, this training method avoids the dependence on a differentiable channel model. V. C ONCLUSIONS

In this paper, we contribute to give a model based on two mutual information estimators for secure communication in the degraded Gaussian wiretap channel where each party is equipped with DNNs. The encoder is trained by security capacity in a manner to provide reliability for legal decoder while ensuring security against wiretapper. The presented results are compared with the other two security constraint methods based on deep learning which highlights the superiority of our proposed method in robustness to different eavesdropping environments without prior knowledge of the decoder. However, future research needs to investigate the performance under unknown eavesdropping channel conditions. Moreover, we can continue to explore the direction corresponds to the case where the broadcast channel is no longer degraded. A

CKNOWLEDGEMENT

This paper is supported by "2020 Industrial Internet Innovation and Development Project (Project No. TC200H01L)". R

EFERENCES [1]

T. Hong, M. Song and Y. Liu, "Dual-Beam Directional Modulation Technique for Physical-Layer Secure Communication," in IEEE Antennas and Wireless Propagation Letters, vol. 10, pp. 1417-1420, 2011, doi: 10.1109/LAWP.2011.2178384. [2]

M. P. Daly and J. T. Bernhard, "Directional Modulation Technique for Phased Arrays," in IEEE Transactions on Antennas and Propagation, vol. 57, no. 9, pp. 2633-2640, Sept. 2009, doi: 10.1109/TAP.2009.2027047. [3]

M. P. Daly, E. L. Daly and J. T. Bernhard, "Demonstration of Directional Modulation Using a Phased Array," in IEEE Transactions on Antennas and Propagation, vol. 58, no. 5, pp. 1545-1550, May 2010, doi: 10.1109/TAP.2010.2044357. [4]

Dong Wei, Lili Liang, Meng Zhang, Rong Qiao and Weiqing Huang, "A Polarization state Modulation based Physical Layer Security scheme for Wireless Communications," MILCOM 2016 - 2016 IEEE Military Communications Conference, Baltimore, MD, 2016, pp. 1195-1201, doi: 10.1109/MILCOM.2016.7795493. [5]

L. Hu, B. Wu, J. Tang, F. Pan and H. Wen, "Outage constrained secrecy rate maximization using artificial-noise aided beamforming and cooperative jamming," 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, 2016, pp. 1-5, doi: 10.1109/ICC.2016.7511535. [6]

Z. Chu, K. Cumanan, Z. Ding, M. Johnston and S. Y. Le Goff, "Secrecy Rate Optimizations for a MIMO Secrecy Channel With a Cooperative Jammer," in IEEE Transactions on Vehicular Technology, vol. 64, no. 5, pp. 1833-1847, May 2015, doi: 10.1109/TVT.2014.2336092. [7]

L. Hu, H. Wen, B. Wu, J. Tang, F. Pan and R. Liao, "Cooperative-Jamming-Aided Secrecy Enhancement in Wireless Networks With Passive Eavesdroppers," in IEEE Transactions on Vehicular Technology, vol. 67, no. 3, pp. 2108-2117, March 2018, doi: 10.1109/TVT.2017.2744660. [8]

K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90. [9]

T. Young, D. Hazarika, S. Poria and E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing [Review Article]," in IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55-75, Aug. 2018, doi: 10.1109/MCI.2018.2840738. [10]

G. Hinton et al., "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," in IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, Nov. 2012, doi: 10.1109/MSP.2012.2205597. [11]

T. O’Shea and J. Hoydis, "An Introduction to Deep Learning for the Physical Layer," in IEEE Transactions on Cognitive Communications and Networking, vol. 3, no. 4, pp. 563-575, Dec. 2017, doi: 10.1109/TCCN.2017.2758370. [12]

R. Fritschek, R. F. Schaefer and G. Wunder, "Deep Learning for the Gaussian Wiretap Channel," ICC 2019 - 2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019, pp. 1-6, doi: 10.1109/ICC.2019.8761681. [13]

Z. Sun, H. Wu, C. Zhao and G. Yue, "End-to-End Learning of Secure Wireless Communications: Confidential Transmission and (a)

AE+CE (b)

MI+CE (c)

MI+MI

Fig. 6: Shows the SER for the three methods. Bob and Eve indicate the error rate of the codeword transmitted before the security encoding, and correspondingly, Secure Bob and Secure Eve indicate the error rate after the security encoding, reflecting the reliability and security of the system respectively. uthentication," in IEEE Wireless Communications, vol. 27, no. 5, pp. 88-95, October 2020, doi: 10.1109/MWC.001.2000005. [14]

I. Belghazi, S. Rajeswar, A. Baratin, R. D. Hjelm, and A. Courville, "MINE: Mutual information neural estimation, " arXiv preprint arXiv:1801.04062, 2018. [15]

R. Fritschek, R. F. Schaefer and G. Wunder, "Deep learning based wiretap coding via mutual information estimation." Proceedings of the 2nd ACM Workshop on Wireless Security and Machine Learning. 2020. [16]

A. D. Wyner, "The Wire-Tap Channel", Bell System Technical Journal, vol. 54, no. 8, pp. 1355-1387, 1975. [17]

A. Wyner, "The wiretap channel", Bell Syst Tech. J., vol. 54, pp. 1355-1387, Oct. 1975. [18]

P. Bergmans, "Random coding theorem for broadcast channels with degraded components," in IEEE Transactions on Information Theory, vol. 19, no. 2, pp. 197-207, March 1973, doi: 10.1109/TIT.1973.1054980. [19]

S. Leung-Yan-Cheong and M. Hellman, "The Gaussian wire-tap channel," in IEEE Transactions on Information Theory, vol. 24, no. 4, pp. 451-456, July 1978, doi: 10.1109/TIT.1978.1055917. [20]