Adversarial Machine Learning for Flooding Attacks on 5G Radio Access Network Slicing
aa r X i v : . [ c s . N I] F e b Adversarial Machine Learning for Flooding Attackson 5G Radio Access Network Slicing
Yi Shi and Yalin E. Sagduyu Virginia Tech, Blacksburg, VA 24061, USA Intelligent Automation, Inc., Rockville, MD 20855, USA
Abstract —Network slicing manages network resources as vir-tual resource blocks (RBs) for the 5G Radio Access Network(RAN). Each communication request comes with quality ofexperience (QoE) requirements such as throughput and la-tency/deadline, which can be met by assigning RBs, communica-tion power, and processing power to the request. For a completedrequest, the achieved reward is measured by the weight (priority)of this request. Then, the reward is maximized over time byallocating resources, e.g., with reinforcement learning (RL).In this paper, we introduce a novel flooding attack on 5Gnetwork slicing, where an adversary generates fake networkslicing requests to consume the 5G RAN resources that would beotherwise available to real requests. The adversary observes thespectrum and builds a surrogate model on the network slicingalgorithm through RL that decides on how to craft fake requeststo minimize the reward of real requests over time. We show thatthe portion of the reward achieved by real requests may be muchless than the reward that would be achieved when there was noattack. We also show that this flooding attack is more effectivethan other benchmark attacks such as random fake requests andfake requests with the minimum resource requirement (lowestQoE requirement). Fake requests may be detected due to theirfixed weight. As an attack enhancement, we present schemesto randomize weights of fake requests and show that it is stillpossible to reduce the reward of real requests while maintainingthe balance on weight distributions.
I. I
NTRODUCTION
5G promises unprecedented performance improvements interms of rate, delay, and energy efficiency compared to 4Gsystems. An important design aspect towards this goal is re-source allocation for network slicing that aims to multiplex andserve multiple virtualized and independent logical networkson the same physical network infrastructure of 5G [1]–[3].5G radio access network (RAN) employs network slicing tomanage its resources as virtual resource blocks (RBs), e.g., anRB may correspond to a frequency band. Then, the networkresource allocation problem can be simplified by consideringhow to allocate virtual RBs without the need to focus on thephysical resources to support RBs. We consider a gNodeBthat supports downlink communication requests from userequipments (UEs). A UE may generate requests with differ-ent Quality of Experience (QoE) requirements for differenttypes of network slices such as Enhanced Mobile Broadband
This effort is supported by the U.S. Army Research Office under contractW911NF-20-C-0055. The content of the information does not necessarilyreflect the position or the policy of the U.S. Government, and no officialendorsement should be inferred. (eMBB), Massive Machine Type Communications (mMTC),and Ultra Reliable Low Latency Communications (URLLC).These QoE requirements are mapped to different requirementson RBs, as well as communication and processing powers atthe gNodeB. Each request has its own priority (measured bya weight). If the gNodeB allocates resources to a request tomeet its requirements, the reward is the weight of this request.If a request is not served, it is kept in a list of active requestsuntil its deadline. This resource allocation approach aims tomaximize the reward over time and can be used in near-realtime RAN Intelligent Controller (Near-RT RIC) to supportmicro-service-based applications called xApps.Since future requests are unknown, an online algorithm isneeded to make decisions based on current network status andrequests. Machine learning can be effectively applied to solvecomplex wireless optimization problems by learning fromspectrum data [4]. In particular, deep learning was studiedfor network slicing in [5] for application and device specificidentification and traffic classification, and in [6] for manage-ment of network load efficiency and network availability. Asthe training data may not be available, reinforcement learning(RL) was used for network slicing without requiring a priormodel [7]–[12]. In our setting, the RL approach learns a modelto predict the future reward (the weight of selected requests)for each state (available resources) and each action (admissionof requests), and determines over time the optimal action tomaximize the expected future reward.With the success of applying machine learning (ML) tonetwork slicing problems, there are also security concerns,specifically due to the attacks built upon adversarial ML.Adversarial ML studies the learning process in the presence ofadversaries and expands the attack surface with new wirelessattacks, e.g., exploratory (inference) attacks [13], evasion(adversarial) attacks [14]–[18], causative (poisoning) attacks[19], membership inference attacks [20], Trojan attacks [21],and signal spoofing attacks [22], [23]. Recently, adversarialML has been used to launch attacks on 5G such as attackon 5G spectrum sharing with incumbents, attack on 5GUE authentication [24], covert 5G communications [25], andattack on 5G network slicing (where the adversary aims tomanipulate the underlying RL algorithm) [26]. In this paper,we consider a flooding type of resource starvation attack thathas found applications in the networking domain, such as theTCP SYN flood attack. For network slicing, we formulate theooding attack as the case that an adversary generates fakerequests to consume network resources. This attack cannot bedetected by simply monitoring the reward since the gNodeBcan still achieve large reward as in the case of no attack.However, a great portion of this reward is associated withfake requests and the reward by real requests is much lessthan that without attack. Compared to conventional jamming(e.g., [27]), the flooding attack is stealthier and more energy-efficient (especially for downlink traffic), since it only requiresan adversary to send requests and ACK without a need of longtransmissions.The design challenge of flooding attack is how to generatefake requests for resources. If there is no limitation, the adver-sary can generate many requests to maximize its impact, but itcan be easily detected and blocked. Thus, we assume an upperbound on the number of fake requests. The adversary needs tocarefully design its requests for network resources and rewardssuch that the impact of these requests is maximized. For thatpurpose, we design an RL solution for the adversary that usesnetwork resources as the state, the generated fake requestsas the action, and the reward achieved by fake requests asthe reward for RL. This solution aims to maximize the totalreward for fake requests over time, which in turn minimizes theremaining reward for real requests. We show that this floodingattack reduces the reward for real requests significantly morethan two benchmark attacks, namely random fake requests andfake requests with lowest QoE requirement (or minimum re-source requirement). Although we mainly launch the floodingattack on an RL based network slicing scheme, we show that itcan also target other network slicing schemes such as myopic,first come first served (FCFS), or random selection.The best strategy (in terms of minimizing the reward ofreal requests) is to generate fake requests with large weights,but they may be easily detected by checking their weights overtime. To overcome this defense, we design attack schemes withvariable weights and show that we can set a close to uniformweight distribution on fake requests such that they cannot beeasily detected based on their weights and the reward of realrequests can still be reduced significantly.The rest of the paper is organized as follows. Section IIdescribes the resource allocation schemes for network slicing.Section III presents the flooding attack that aims to minimizethe gNodeB’s performance for real communication requests.Section IV evaluates the attack performance under differentsettings and designs. Section V concludes this paper.II. R
ESOURCE A LLOCATION FOR N ETWORK S LICING
Resource allocation for network slicing can be optimizedby RL. We follow the RL approach in [11] as an example.Fig. 1 shows multiple 5G UEs sending requests over timewith different rate, processing power, latency (deadline) andlifetime requirements and priority weights. The 5G gNodeBselectively serves some of these requests competing for re-sources. If a request is selected, resources are allocated to meetthe requirements. Otherwise, it will be kept in a waiting listuntil its deadline expires. The 5G gNodeB selects requests to
Fig. 1: System model for the attack on 5G network slicing. maximize the reward, i.e., the total weight of served requestsover a time period. An adversary launches the flooding attackto generate fake requests such that the total weight of servedreal requests over a time period can be minimized. Denote theset of active requests (newly arrived requests and previouslyarrived requests that stay in the waiting list) at time t as A ( t ) for each time t . RBs, communication and processing powersare allocated to meet the requirements of admitted requests in A ( t ) . The rate and processing power requirements of UE i forits request j are given by D ij ≥ d ij x ij ( t ) , P ij ≥ p ij x ij ( t ) , ( i, j ) ∈ A ( t ) , (1)where D ij is the achieved data rate, d ij is the minimumrequired rate, P ij is the assigned processing power, p ij isthe minimum required processing power (measured by thepercentage of CPU usage), and x ij ( t ) is the binary indicator onwhether UE i ’s request j is satisfied at time t . At any time, thetotal P ij of selected requests is no more than . D ij measuredin bps depends on the assigned bandwidth F ij and the modu-lation coding scheme used for communications from gNodeBto UE i , and is approximated as D ij = c · K ij · (1 − BER ij ) [28], where K ij is the number of allocated RBs and BER ij is the bit error rate of UE i for its request j , and constant c is approximately . · when a single-antenna UE usesQPSK modulation, kHz subcarrier spacing and MHzbandwidth. The constraints of RB assignments are X i,j F ij x ij ( t ) ≤ F ( t ) , ( i, j ) ∈ A ( t ) , (2)where F ( t ) represents the available RBs of the gNodeB attime t (resources that are assigned previously to some requestsand not released yet become temporarily unavailable). Byconsidering the optimization problem for a time horizon, theresources are updated from time t − to time t as F ( t ) = F ( t −
1) + F r ( t − − F a ( t − , (3)where F r ( t − and F a ( t − are released and allocatedresources on frequency at time t − . Each request has alifetime l ij and if it is selected at time t s (namely, the servicetarts at time t s ), this request will end at time t s + l ij . Thereleased and allocated resources at time t are given by F r ( t ) = X ( i,j ) ∈ R ( t ) F ij , F a ( t ) = X i,j F ij x ij ( t ) , (4)respectively, where R ( t ) denotes the set of requests ending attime t . Then, the optimization problem is given by max x ij ( t ) X t X ( i,j ) w ij x ij ( t ) , ( i, j ) ∈ A ( t ) (5)subject to (1)–(4), where w ij is the weight for UE i ’s request j to reflect its priority. Without knowing future requests, themodel-free RL algorithm solves (5) by making decisions usingan online learned policy that determines an action for currentstate for the gNodeB. In this paper, we consider Q-learningfrom [11]. The reward at time t is w ij if UE i ’s request j is satisfied, i.e., x ij ( t ) = 1 . Actions assign resources to eachrequest at time t . Multiple actions can be taken at the sametime instance if there are sufficient resources. The states at t are the remaining RBs and communication and processingpowers. Given by (3)-(4), state transition at time t is driven byallocating resources for requests granted at time t and releasingresources after lifetimes of some active services expire attime t . Note that the flooding attack can be also launchedon other network slicing schemes. We consider the followingschemes for comparison purposes. The myopic scheme aims tomaximize the reward for the current time (without consideringfuture rewards). The FCFS scheme aims to allocate resourcesbased on the arrival time of requests. The random schemeallocates resources to some randomly selected requests.III. R EINFORCEMENT L EARNING BASED F LOODING A TTACK FOR N ETWORK S LICING
An adversary attacks the 5G RAN network slicing by gen-erating fake requests for network slices. If these fake requestsare selected and network resources are allocated to them, fewerresources will be left for real requests from legitimate users.As a consequence, although a gNodeB may still achieve ahigh reward for many granted requests, the actual rewardcorresponding to real requests may be a smaller option ofit. We consider a practical constraint that the adversary has alimited rate of generating fake requests to avoid being detected.In particular, we can set the same rate of request generationfor the adversary and legitimate users. Thus, it is important forthe adversary to generate fake requests with two objectives: • Objective 1: fake requests are satisfied (namely, resourcesare allocated to them) with high probability. • Objective 2: fake requests occupy resources so that realrequests cannot be satisfied due to no resources.For the first objective, a fake request should ask for a smallerportion of resources (to avoid being rejected due to insufficientresources) and have the maximum reward (to have highpriority). For the second objective, a fake request shouldconsume the majority of the resources such that the remainingresources are not sufficient for real requests. Thus, the idealsetting on weights in fake requests is the largest weight while the required resource should not be too large (to ensure thata fake request can be satisfied) or too small (to ensure that asignificant portion of the total resources can be occupied byfake requests). Therefore, the key step in the flooding attackis determining resources to be specified by fake requests.The resources include the number of available RBs, theremaining processing power (memory), and the remainingcommunication (transmit) power. The adversary may sense thespectrum and aim to detect available RBs. However, it cannotknow the remaining processing and communication powers.There is no need to consume all of these resources to preventserving real requests, since a request cannot be satisfied if anyof its resource requirements is not met. Thus, the adversary canrequest minimum processing and communication powers suchthat its requests will not be rejected due to the lack of theseresources. On the other hand, the adversary needs to determineRB requirements in its requests. To make decisions, we designa Q-learning algorithm for the adversary as follows. • The state is the number of available RBs.The action is to select how many RBs to be assigned in afake request ( means no request is made). The numberof potential actions is n a + 1 , where n a is the number ofavailable RBs. • The reward is the number of served fake requests (or thetotal reward of served fake requests).The Q-table maps (state, action) to reward. To initialize thistable, if a fake request is generated, the corresponding entryis set as , otherwise the entry is set as . The adversaryapplies Q-learning to update this table and take actions basedon this table. The adversary generates a fake request only ifthe rate of fake requests so far is below the expected rate,which can be set the same as the rate of real requests fromother (legitimate) users. Under the flooding attack, we measureboth the total reward (including the reward for both real andfake requests) and the real reward (including the reward forreal requests only).IV. P ERFORMANCE E VALUATION
A. Flooding Attack Results
Suppose that the gNodeB receives requests from three UEs.For each UE, requests arrive with rate of . per slot. Theadversary also generates fake requests at this rate. Here, a slotcorresponds to each time block which is . ms long with kHz subcarrier spacing. For each request, weight, lifetime,and deadline are randomly assigned in [1 , , [1 , slots,and [1 , slots, respectively. The signal-noise-ratio (SNR) isselected randomly from [1 . , . The total frequency is MHzand is split into bands, i.e., there are RBs. In additionto the Q-learning-based attack, we also consider the case ofno attack and two benchmark attacks: • Random attack : The adversary generates fake requestswith random requirements on RBs. • Minimum resource (MinRes) attack : The adversary gen-erates fake requests with the lowest QoE requirement,
ABLE I: Performance comparison of flooding attack using Q-learning andother attack schemes.
Algorithm Total reward Real rewardQ-learning 2593 523MinRes 2769 614Random 1905 1630No attack 1783 1783
TABLE II: Performance comparison of different network slicing schemesunder the flooding attack.
Network slicing No attack Flooding attackscheme Total reward Real rewardQ-learning 1786 2593 523Myopic 1422 2408 653FCFS 1416 2369 429random 1318 2363 483 which in turn requires the minimum number of RBs, i.e.,always one RB is required.We assume that the adversary launches its attack (Q-learning, MinRes, or Random) over slots. The bench-mark of no attack case is also run over slots. Theachieved reward is measured for the last slots. Fornetwork slicing, we first consider the RL based scheme.Table I shows the performance of different attacks and thecase of no attack. If there is no attack, i.e., all requestsare real, the total reward and the real portion of it are thesame. All attacks increase the total reward, since there aresome additional fake requests with high reward, but the realportion of the reward is less than the total (all real) rewardachieved under no attack. The random attack does not workwell and only slightly reduces the real reward (from to ). The MinRes attack always generates fake requests withthe minimum required resource. Although this increases theprobability of being selected by the gNodeB for service, theoccupied resource is also minimized. Hence, it significantlyreduces the real reward to but it is not as effective as theQ-learning attack, which reduces the real reward to .We also measure the total reward of all fake requestsgenerated under the Q-learning attack, which is . Thetotal reward of served fake requests is −
523 = 2070 , i.e.,most of fake requests are served and occupy some resources.Thus, the Q-learning attack is efficient in terms of the ratiobetween served and generated requests. The total reward askedfor real requests is , while the total reward of served realrequests is only under the Q-learning attack, i.e., only asmall portion of real requests are served under the floodingattack, showing that the flooding attack is highly successful.This flooding attack can be launched against other networkslicing schemes including the myopic, FCFS, and randomschemes described in Section II. Table II shows that the(reward) performance of all these schemes drop significantlyunder the flooding attack. In particular, the FCFS and ran-dom schemes have worse performance than the Q-learningbased scheme regardless there is a flooding attack, or not.One interesting result is that the myopic scheme has worse
TABLE III: The effect of the generation rate of fake requests, r f . r f Total reward Real reward0 1783 17830.1 2077 15870.2 2327 13520.3 2512 10320.4 2605 7000.5 2593 5230.6 2620 440 ≥ performance compared to the Q-learning based scheme whenthere is no attack, but under the flooding attack, the myopicscheme achieves better reward, namely , compared to thereward achieved by Q-learning based scheme. The reasonis that the Q-learning based scheme aims to maximize theexpected total reward, including both current reward and futurereward. When there are fake requests with high rewards, theQ-learning algorithm tends not to allocate resources to realrequests if their reward is not high. On the other hand, thereis no such issue in the myopic scheme since it only considersthe current reward. B. Impact of System Parameters
Now we check the effect of system parameters on theflooding attack performance. Table III shows the results whenwe vary the rate of fake requests, r f . The total reward increasesfirst with r f due to high reward of fake requests, while thereal reward decreases due to the increasing portion of fakerequests. If r f ≥ . request per slot, the total or real rewarddoes not change, i.e., r f = 0 . is sufficient for the best attack.Table IV shows the results when we vary the number ofRBs, n r . The reward when there is no attack increases firstwith n r due to more RBs, and then changes within certainrange due to randomness in limited number of user requests.The total reward under attack shows the same trend. The realreward under attack first decreases with n r and then stayswithin certain range. To understand the trend better, we assessthe ratio between real reward under attack and the reward whenthere is no attack. By increasing n r , this ratio first decreasesas a fake request can attack more RBs, and then stays withincertain range since (i) the number of fake requests is limitedand (ii) there is some randomness in generated fake requests.Table V shows the results when we vary the number ofusers, n u . By increasing n u , the reward when there is no attackincreases due to more requests to be selected for service. Thetotal reward and the real reward under attack show the sametrend. When n u is large, the total reward under attack is lessthan the reward when there is no attack. The reason is thatthe adversary generates fake requests with non-minimum RBswhile with many users it is likely that there are requests withthe same reward and minimum RBs. Then, the total reward byselecting some fake requests may be less than the case of notselecting fake requests. The ratio of real reward over rewardunder no attack increases with n u due to more real requestscompeting with fake requests. ABLE IV: The effect of the number of RBs, n r . n r No attack Flooding attack Ratio (%)Total reward Real reward5 1526 2133 858 56.236 1605 2285 805 50.167 1786 2477 737 41.278 1730 2596 646 37.349 1870 2619 609 32.5710 1902 2644 604 31.7611 1783 2593 523 29.3312 2018 2723 528 26.1613 1964 2677 577 29.3814 1987 2661 506 25.4715 1899 2708 573 30.17
TABLE V: The effect of the number of users, n u . User No attack Flooding attack Ratio (%)Total reward Real reward3 1783 2593 523 29.334 2239 2717 652 29.125 2306 2749 709 30.756 2681 2856 955 35.627 2592 2850 890 34.348 2865 2854 954 33.309 2896 2859 1029 35.5310 3108 2918 1108 35.6520 3540 3065 1480 41.8150 3859 3360 2100 54.42
Table VI shows the results when we vary the SNR for users.By increasing the SNR, all rewards (real or total) increasewith and without flooding attack. In particular, the ratio ofreal reward over the reward under no attack increases due tobetter channels available for users. In this case, it is easierto serve a real request by meeting rate requirements, i.e., itis more challenging for the attack to deny service to a realrequest by consuming resources.Finally, we evaluate the effect of the adversary’s observationerror on available RBs, in terms of false alarm (availableRBs are detected as unavailable) and misdetection (unavailableRBs are detected as available). We find that this effect is notsignificant. Even for significant errors up to %, the changeon real reward is at most . %. Hence, the flooding attackis not very sensitive to errors in spectrum sensing. C. Attack Extensions with Enhanced Weight Distribution
The flooding attack that we consider so far generatesfake requests with the largest weight (LW) to maximize theprobability that they are selected by the gNodeB. A defensescheme may detect the largest weight in requests from anadversary and then discard these requests. Against such a
TABLE VI: The effect of the SNR for users.
SNR No attack Flooding attack Ratio (%)Total reward Real rewardlow 1557 2436 286 18.37medium 1783 2593 523 29.33high 1900 2695 625 32.89
TABLE VII: Distribution of weights in network slicing requests. j \ i TABLE VIII: Distribution for high weight. j \ i defense, we extend the flooding attack with the followingschemes to increase randomness of weights. • Uniform weight (UW) : Weights in fake requests areuniformly randomly assigned. • Uniform large reward (ULW) : Weights in fake requestsare uniformly randomly assigned as large values or . • Resource dependent weight (RDW) : The flooding attackrequests should be accepted when remaining resourcesare large, since it is more likely that real requests canbe accepted if no attack. Thus, the adversary can gen-erate more fake requests with large weights if remain-ing resources are large. Otherwise, the adversary cangenerate more fake requests with small weights. Theresource dependent weight distributions are selected suchthat the overall weight distribution remains uniform, i.e., F P i p ij = W , where F is the number of total availableRBs, p ij is the probability of weight is j when the numberof remaining RBs is i , and W is the number of differentweights. One such distribution is shown in Table VII. • Resource dependent large weight (RDLW) : We limit theweights in fake requests to large values or , and chooseweight distributions such that F P i p ij = 0 . for j = 4 or . One such distribution is shown in Table VIII. • Adjusted weight (AW) : The adversary adjusts weightsdynamically based on whether a fake request is selectedor not. If a fake request is selected, the adversary wantsto have another fake request being selected to occupyresources, otherwise there is no such need. Thus, theweight in requests should be increased (bounded by thelargest reward) if a fake request is selected, otherwisethe weight should be decreased (bounded by the smallestreward). We consider three adjustment approaches. – AW 1: increase the weight in fake requests by if afake request is selected or decrease it by otherwise, – AW 2: increase the weight in fake requests to thelargest value if a fake request is selected or decreaseit by otherwise, or – AW 3: increase the weight in fake requests to thelargest value if a fake request is selected or decreaseit by with a probability otherwise. ABLE IX: Attack extensions with different weights in fake requests.
Algorithm Total reward Real rewardLW 2593 523UW 2328 1161ULW 2621 828RDW 2597 1008RDLW 2689 677AW 1 1798 1417AW 2 2389 1008AW 3 2245 1146
Table IX shows results for these attack extensions. First ofall, since the adversary aims to sustain uniform distributionamong different weights in its requests, the probability ofselecting a fake request is reduced and thus the real rewardunder all these attacks is larger than the LW attack case wherewe set the weight in fake requests as the largest value. Underthe flooding attack with UW, the real reward is whilethe real reward under the flooding attack with ULW is .The real reward achieved under the flooding attack with RDWis , which is still much higher than the real reward when we fix the reward as its largest value. The real rewardunder the flooding attack with RDLW is , which is close to when we fix reward as . For the flooding attack with AW,AW 1 yields a real reward , which is high, as it turns outthat most weights for fake requests are selected as . WhenAW 2 is used, the real reward becomes and we find thatthe distribution of weights is almost a uniform distribution.When we use AW 3 with the probability set as . , whichalso yields an approximately uniform distribution for rewardin fake requests, the achieved real reward is . Overall, theflooding attack can reduce the real reward significantly whilekeeping close to uniform weight distribution such that it isdifficult to detect fake requests by checking their weights.V. C ONCLUSION
We presented a flooding attack on 5G RAN slicing, wherean adversary generates fake network slicing requests to con-sume available resources and minimize the reward for realrequests. We developed an RL based attack to generate fakerequests and showed that it is more effective than gen-erating fake requests randomly or with minimum resourcerequirement. Although we focused on attacking an RL basednetwork slicing, we showed that the flooding attack is effectiveagainst other network slicing schemes. We designed weightdistribution schemes for fake requests such that they cannot bedetected by their weights, and showed that the flooding attackusing a close to uniform weight distribution is still effective.Our results indicate that 5G RAN slicing is highly vulnerableto flooding attacks that can significantly reduce the reward ofreal requests by starving resources with fake requests.R
EFERENCES[1] X. Foukas, G. Patounas, A. Elmokashfi, and M. K. Marina, “Networkslicing in 5G: Survey and challenges,”
IEEE Communications Magazine ,2017.[2] A. Kaloxylos, “A survey and an analysis of network slicing in 5Gnetworks,”
IEEE Communications Standards Magazine , 2018. [3] S. D’Oro, F. Restuccia, A. Talamonti, and T. Melodia, “The sliceis served: Enforcing radio access network slicing in virtualized 5Gsystems,”
IEEE INFOCOM , 2019.[4] T. Erpek, T. O’Shea, Y. E. Sagduyu, Y. Shi, and T. C. Clancy, “Deeplearning for wireless communications,”
Development and Analysis ofDeep Learning Architectures, Springer , 2019.[5] A. Nakao, and P. Du, “Toward in-network deep machine learningfor identifying mobile applications and enabling application specificnetwork slicing,”
IEICE Transactions on Communications , 2018.[6] A. Thantharate, R. Paropkari, V. Walunj, C. Beard, “DeepSlice: A deeplearning approach towards an efficient and reliable network slicing in5G networks,”
IEEE UEMCON , 2019.[7] R. Li, Z. Zhao, Q. Sun, C.-L. I, C. Yang, X. Chen, M. Zhao, andH. Zhang, “Deep reinforcement learning for resource management innetwork slicing,”
IEEE Access , 2018.[8] J. Koo, M. R. Rahman, V. B. Mendiratta, and A. Walid, “Deepreinforcement learning for network slicing with heterogeneous re-source requirements and time varying traffic dynamics,” arXiv prePrint ,arXiv:1908.03242, 2019.[9] H. Wang, Y. Wu, G. Mina, J. Xu, and P. Tang, “Data-driven dynamicresource scheduling for network slicing: A deep reinforcement learningapproach,”
Information Sciences , 2019.[10] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deepreinforcement learning based framework for power-efficient resourceallocation in cloud RANs,”
IEEE ICC , 2017.[11] Y. Shi, Y. E. Sagduyu, and T. Erpek, “Reinforcement learning fordynamic resource optimization in 5G radio access network slicing,”
IEEE CAMAD , 2020.[12] A. Nassar and Y. Yilmaz, “Deep reinforcement learning for adaptivenetwork slicing in 5G for intelligent vehicular systems and smart cities,” arXiv prePrint , arXiv:2010.09916, 2020.[13] T. Erpek, Y. E. Sagduyu, and Y. Shi, “Deep learning for launching andmitigating wireless jamming attacks,
IEEE Transactions on CognitiveCommunications and Networking , 2019.[14] M. Sadeghi and E. Larsson, “Adversarial attacks on deep-learning basedradio signal classification,”
IEEE Communications Letters , Feb. 2019.[15] M. Z. Hameed, A. Gyorgy, and D. Gunduz, “The best defense is agood offense: Adversarial attacks to avoid modulation detection,”
IEEETransactions on Information Forensics and Security , Sept. 2020.[16] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus, “Over-the-air adversarial attacks on deep learning based modulation classifierover wireless channels,”
IEEE CISS , 2020.[17] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus,“Channel-aware adversarial attacks against deep learning-based wirelesssignal classifiers,” arXiv prePrint , arXiv:2005.05321, 2020.[18] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus,“Adversarial attacks with multiple antennas against deep learning-basedmodulation classifiers,”
IEEE GLOBECOM , 2020.[19] Y. E. Sagduyu, T. Erpek, and Y. Shi, “Adversarial deep learning forover-the-air spectrum poisoning attacks,”
IEEE Transactions on MobileComputing , Feb. 2021.[20] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Over-the-air membershipinference attacks as privacy threats for deep learning-based wirelesssignal classifiers,”
ACM WiseML , 2020.[21] K. Davaslioglu and Y. E. Sagduyu, “Trojan attacks on wireless signalclassification with adversarial machine learning,”
IEEE DySPAN , 2019.[22] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Generative AdversarialNetwork in the Air: Deep Adversarial Learning for Wireless SignalSpoofing,”
IEEE Transactions on Cognitive Communications and Net-working , 2020.[23] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Generative AdversarialNetwork for Wireless Signal Spoofing,”
ACM WiseML , 2019.[24] Y. E. Sagduyu, T. Erpek, and Y. Shi, “Adversarial machine learning for5G communications security,” arXiv preprint arXiv:2101.02656, 2021.[25] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus, “Howto make 5G communications “invisible”: adversarial machine learningfor wireless privacy,”
Asilomar Conf. Sig., Sys. and Comp. , 2020.[26] Y. Shi, Y. E. Sagduyu, T. Erpek, and M. C. Gursoy, “How to attack anddefend 5G radio access network slicing with reinforcement learning,” arXiv preprint arXiv:2101.05768, 2021.[27] Y. E. Sagduyu, R. Berry and A. Ephremides, “Wireless jamming attacksunder dynamic traffic uncertainty,”