[PDF] Adversarial Machine Learning for Flooding Attacks on 5G Radio Access Network Slicing

Abstract

Network slicing manages network resources as virtual resource blocks (RBs) for the 5G Radio Access Network (RAN). Each communication request comes with quality of experience (QoE) requirements such as throughput and latency/deadline, which can be met by assigning RBs, communication power, and processing power to the request. For a completed request, the achieved reward is measured by the weight (priority) of this request. Then, the reward is maximized over time by allocating resources, e.g., with reinforcement learning (RL). In this paper, we introduce a novel flooding attack on 5G network slicing, where an adversary generates fake network slicing requests to consume the 5G RAN resources that would be otherwise available to real requests. The adversary observes the spectrum and builds a surrogate model on the network slicing algorithm through RL that decides on how to craft fake requests to minimize the reward of real requests over time. We show that the portion of the reward achieved by real requests may be much less than the reward that would be achieved when there was no attack. We also show that this flooding attack is more effective than other benchmark attacks such as random fake requests and fake requests with the minimum resource requirement (lowest QoE requirement). Fake requests may be detected due to their fixed weight. As an attack enhancement, we present schemes to randomize weights of fake requests and show that it is still possible to reduce the reward of real requests while maintaining the balance on weight distributions.

Full PDF

aa r X i v : . [ c s . N I] F e b Adversarial Machine Learning for Flooding Attackson 5G Radio Access Network Slicing

Yi Shi and Yalin E. Sagduyu Virginia Tech, Blacksburg, VA 24061, USA Intelligent Automation, Inc., Rockville, MD 20855, USA

Abstract —Network slicing manages network resources as vir-tual resource blocks (RBs) for the 5G Radio Access Network(RAN). Each communication request comes with quality ofexperience (QoE) requirements such as throughput and la-tency/deadline, which can be met by assigning RBs, communica-tion power, and processing power to the request. For a completedrequest, the achieved reward is measured by the weight (priority)of this request. Then, the reward is maximized over time byallocating resources, e.g., with reinforcement learning (RL).In this paper, we introduce a novel ﬂooding attack on 5Gnetwork slicing, where an adversary generates fake networkslicing requests to consume the 5G RAN resources that would beotherwise available to real requests. The adversary observes thespectrum and builds a surrogate model on the network slicingalgorithm through RL that decides on how to craft fake requeststo minimize the reward of real requests over time. We show thatthe portion of the reward achieved by real requests may be muchless than the reward that would be achieved when there was noattack. We also show that this ﬂooding attack is more effectivethan other benchmark attacks such as random fake requests andfake requests with the minimum resource requirement (lowestQoE requirement). Fake requests may be detected due to theirﬁxed weight. As an attack enhancement, we present schemesto randomize weights of fake requests and show that it is stillpossible to reduce the reward of real requests while maintainingthe balance on weight distributions.

I. I

NTRODUCTION

5G promises unprecedented performance improvements interms of rate, delay, and energy efﬁciency compared to 4Gsystems. An important design aspect towards this goal is re-source allocation for network slicing that aims to multiplex andserve multiple virtualized and independent logical networkson the same physical network infrastructure of 5G [1]–[3].5G radio access network (RAN) employs network slicing tomanage its resources as virtual resource blocks (RBs), e.g., anRB may correspond to a frequency band. Then, the networkresource allocation problem can be simpliﬁed by consideringhow to allocate virtual RBs without the need to focus on thephysical resources to support RBs. We consider a gNodeBthat supports downlink communication requests from userequipments (UEs). A UE may generate requests with differ-ent Quality of Experience (QoE) requirements for differenttypes of network slices such as Enhanced Mobile Broadband

This effort is supported by the U.S. Army Research Ofﬁce under contractW911NF-20-C-0055. The content of the information does not necessarilyreﬂect the position or the policy of the U.S. Government, and no ofﬁcialendorsement should be inferred. (eMBB), Massive Machine Type Communications (mMTC),and Ultra Reliable Low Latency Communications (URLLC).These QoE requirements are mapped to different requirementson RBs, as well as communication and processing powers atthe gNodeB. Each request has its own priority (measured bya weight). If the gNodeB allocates resources to a request tomeet its requirements, the reward is the weight of this request.If a request is not served, it is kept in a list of active requestsuntil its deadline. This resource allocation approach aims tomaximize the reward over time and can be used in near-realtime RAN Intelligent Controller (Near-RT RIC) to supportmicro-service-based applications called xApps.Since future requests are unknown, an online algorithm isneeded to make decisions based on current network status andrequests. Machine learning can be effectively applied to solvecomplex wireless optimization problems by learning fromspectrum data [4]. In particular, deep learning was studiedfor network slicing in [5] for application and device speciﬁcidentiﬁcation and trafﬁc classiﬁcation, and in [6] for manage-ment of network load efﬁciency and network availability. Asthe training data may not be available, reinforcement learning(RL) was used for network slicing without requiring a priormodel [7]–[12]. In our setting, the RL approach learns a modelto predict the future reward (the weight of selected requests)for each state (available resources) and each action (admissionof requests), and determines over time the optimal action tomaximize the expected future reward.With the success of applying machine learning (ML) tonetwork slicing problems, there are also security concerns,speciﬁcally due to the attacks built upon adversarial ML.Adversarial ML studies the learning process in the presence ofadversaries and expands the attack surface with new wirelessattacks, e.g., exploratory (inference) attacks [13], evasion(adversarial) attacks [14]–[18], causative (poisoning) attacks[19], membership inference attacks [20], Trojan attacks [21],and signal spooﬁng attacks [22], [23]. Recently, adversarialML has been used to launch attacks on 5G such as attackon 5G spectrum sharing with incumbents, attack on 5GUE authentication [24], covert 5G communications [25], andattack on 5G network slicing (where the adversary aims tomanipulate the underlying RL algorithm) [26]. In this paper,we consider a ﬂooding type of resource starvation attack thathas found applications in the networking domain, such as theTCP SYN ﬂood attack. For network slicing, we formulate theooding attack as the case that an adversary generates fakerequests to consume network resources. This attack cannot bedetected by simply monitoring the reward since the gNodeBcan still achieve large reward as in the case of no attack.However, a great portion of this reward is associated withfake requests and the reward by real requests is much lessthan that without attack. Compared to conventional jamming(e.g., [27]), the ﬂooding attack is stealthier and more energy-efﬁcient (especially for downlink trafﬁc), since it only requiresan adversary to send requests and ACK without a need of longtransmissions.The design challenge of ﬂooding attack is how to generatefake requests for resources. If there is no limitation, the adver-sary can generate many requests to maximize its impact, but itcan be easily detected and blocked. Thus, we assume an upperbound on the number of fake requests. The adversary needs tocarefully design its requests for network resources and rewardssuch that the impact of these requests is maximized. For thatpurpose, we design an RL solution for the adversary that usesnetwork resources as the state, the generated fake requestsas the action, and the reward achieved by fake requests asthe reward for RL. This solution aims to maximize the totalreward for fake requests over time, which in turn minimizes theremaining reward for real requests. We show that this ﬂoodingattack reduces the reward for real requests signiﬁcantly morethan two benchmark attacks, namely random fake requests andfake requests with lowest QoE requirement (or minimum re-source requirement). Although we mainly launch the ﬂoodingattack on an RL based network slicing scheme, we show that itcan also target other network slicing schemes such as myopic,ﬁrst come ﬁrst served (FCFS), or random selection.The best strategy (in terms of minimizing the reward ofreal requests) is to generate fake requests with large weights,but they may be easily detected by checking their weights overtime. To overcome this defense, we design attack schemes withvariable weights and show that we can set a close to uniformweight distribution on fake requests such that they cannot beeasily detected based on their weights and the reward of realrequests can still be reduced signiﬁcantly.The rest of the paper is organized as follows. Section IIdescribes the resource allocation schemes for network slicing.Section III presents the ﬂooding attack that aims to minimizethe gNodeB’s performance for real communication requests.Section IV evaluates the attack performance under differentsettings and designs. Section V concludes this paper.II. R

ESOURCE A LLOCATION FOR N ETWORK S LICING

Resource allocation for network slicing can be optimizedby RL. We follow the RL approach in [11] as an example.Fig. 1 shows multiple 5G UEs sending requests over timewith different rate, processing power, latency (deadline) andlifetime requirements and priority weights. The 5G gNodeBselectively serves some of these requests competing for re-sources. If a request is selected, resources are allocated to meetthe requirements. Otherwise, it will be kept in a waiting listuntil its deadline expires. The 5G gNodeB selects requests to

Fig. 1: System model for the attack on 5G network slicing. maximize the reward, i.e., the total weight of served requestsover a time period. An adversary launches the ﬂooding attackto generate fake requests such that the total weight of servedreal requests over a time period can be minimized. Denote theset of active requests (newly arrived requests and previouslyarrived requests that stay in the waiting list) at time t as A ( t ) for each time t . RBs, communication and processing powersare allocated to meet the requirements of admitted requests in A ( t ) . The rate and processing power requirements of UE i forits request j are given by D ij ≥ d ij x ij ( t ) , P ij ≥ p ij x ij ( t ) , ( i, j ) ∈ A ( t ) , (1)where D ij is the achieved data rate, d ij is the minimumrequired rate, P ij is the assigned processing power, p ij isthe minimum required processing power (measured by thepercentage of CPU usage), and x ij ( t ) is the binary indicator onwhether UE i ’s request j is satisﬁed at time t . At any time, thetotal P ij of selected requests is no more than . D ij measuredin bps depends on the assigned bandwidth F ij and the modu-lation coding scheme used for communications from gNodeBto UE i , and is approximated as D ij = c · K ij · (1 − BER ij ) [28], where K ij is the number of allocated RBs and BER ij is the bit error rate of UE i for its request j , and constant c is approximately . · when a single-antenna UE usesQPSK modulation, kHz subcarrier spacing and MHzbandwidth. The constraints of RB assignments are X i,j F ij x ij ( t ) ≤ F ( t ) , ( i, j ) ∈ A ( t ) , (2)where F ( t ) represents the available RBs of the gNodeB attime t (resources that are assigned previously to some requestsand not released yet become temporarily unavailable). Byconsidering the optimization problem for a time horizon, theresources are updated from time t − to time t as F ( t ) = F ( t −

1) + F r ( t − − F a ( t − , (3)where F r ( t − and F a ( t − are released and allocatedresources on frequency at time t − . Each request has alifetime l ij and if it is selected at time t s (namely, the servicetarts at time t s ), this request will end at time t s + l ij . Thereleased and allocated resources at time t are given by F r ( t ) = X ( i,j ) ∈ R ( t ) F ij , F a ( t ) = X i,j F ij x ij ( t ) , (4)respectively, where R ( t ) denotes the set of requests ending attime t . Then, the optimization problem is given by max x ij ( t ) X t X ( i,j ) w ij x ij ( t ) , ( i, j ) ∈ A ( t ) (5)subject to (1)–(4), where w ij is the weight for UE i ’s request j to reﬂect its priority. Without knowing future requests, themodel-free RL algorithm solves (5) by making decisions usingan online learned policy that determines an action for currentstate for the gNodeB. In this paper, we consider Q-learningfrom [11]. The reward at time t is w ij if UE i ’s request j is satisﬁed, i.e., x ij ( t ) = 1 . Actions assign resources to eachrequest at time t . Multiple actions can be taken at the sametime instance if there are sufﬁcient resources. The states at t are the remaining RBs and communication and processingpowers. Given by (3)-(4), state transition at time t is driven byallocating resources for requests granted at time t and releasingresources after lifetimes of some active services expire attime t . Note that the ﬂooding attack can be also launchedon other network slicing schemes. We consider the followingschemes for comparison purposes. The myopic scheme aims tomaximize the reward for the current time (without consideringfuture rewards). The FCFS scheme aims to allocate resourcesbased on the arrival time of requests. The random schemeallocates resources to some randomly selected requests.III. R EINFORCEMENT L EARNING BASED F LOODING A TTACK FOR N ETWORK S LICING

An adversary attacks the 5G RAN network slicing by gen-erating fake requests for network slices. If these fake requestsare selected and network resources are allocated to them, fewerresources will be left for real requests from legitimate users.As a consequence, although a gNodeB may still achieve ahigh reward for many granted requests, the actual rewardcorresponding to real requests may be a smaller option ofit. We consider a practical constraint that the adversary has alimited rate of generating fake requests to avoid being detected.In particular, we can set the same rate of request generationfor the adversary and legitimate users. Thus, it is important forthe adversary to generate fake requests with two objectives: • Objective 1: fake requests are satisﬁed (namely, resourcesare allocated to them) with high probability. • Objective 2: fake requests occupy resources so that realrequests cannot be satisﬁed due to no resources.For the ﬁrst objective, a fake request should ask for a smallerportion of resources (to avoid being rejected due to insufﬁcientresources) and have the maximum reward (to have highpriority). For the second objective, a fake request shouldconsume the majority of the resources such that the remainingresources are not sufﬁcient for real requests. Thus, the idealsetting on weights in fake requests is the largest weight while the required resource should not be too large (to ensure thata fake request can be satisﬁed) or too small (to ensure that asigniﬁcant portion of the total resources can be occupied byfake requests). Therefore, the key step in the ﬂooding attackis determining resources to be speciﬁed by fake requests.The resources include the number of available RBs, theremaining processing power (memory), and the remainingcommunication (transmit) power. The adversary may sense thespectrum and aim to detect available RBs. However, it cannotknow the remaining processing and communication powers.There is no need to consume all of these resources to preventserving real requests, since a request cannot be satisﬁed if anyof its resource requirements is not met. Thus, the adversary canrequest minimum processing and communication powers suchthat its requests will not be rejected due to the lack of theseresources. On the other hand, the adversary needs to determineRB requirements in its requests. To make decisions, we designa Q-learning algorithm for the adversary as follows. • The state is the number of available RBs.The action is to select how many RBs to be assigned in afake request ( means no request is made). The numberof potential actions is n a + 1 , where n a is the number ofavailable RBs. • The reward is the number of served fake requests (or thetotal reward of served fake requests).The Q-table maps (state, action) to reward. To initialize thistable, if a fake request is generated, the corresponding entryis set as , otherwise the entry is set as . The adversaryapplies Q-learning to update this table and take actions basedon this table. The adversary generates a fake request only ifthe rate of fake requests so far is below the expected rate,which can be set the same as the rate of real requests fromother (legitimate) users. Under the ﬂooding attack, we measureboth the total reward (including the reward for both real andfake requests) and the real reward (including the reward forreal requests only).IV. P ERFORMANCE E VALUATION

A. Flooding Attack Results

Suppose that the gNodeB receives requests from three UEs.For each UE, requests arrive with rate of . per slot. Theadversary also generates fake requests at this rate. Here, a slotcorresponds to each time block which is . ms long with kHz subcarrier spacing. For each request, weight, lifetime,and deadline are randomly assigned in [1 , , [1 , slots,and [1 , slots, respectively. The signal-noise-ratio (SNR) isselected randomly from [1 . , . The total frequency is MHzand is split into bands, i.e., there are RBs. In additionto the Q-learning-based attack, we also consider the case ofno attack and two benchmark attacks: • Random attack : The adversary generates fake requestswith random requirements on RBs. • Minimum resource (MinRes) attack : The adversary gen-erates fake requests with the lowest QoE requirement,

ABLE I: Performance comparison of ﬂooding attack using Q-learning andother attack schemes.

Algorithm Total reward Real rewardQ-learning 2593 523MinRes 2769 614Random 1905 1630No attack 1783 1783

TABLE II: Performance comparison of different network slicing schemesunder the ﬂooding attack.

Network slicing No attack Flooding attackscheme Total reward Real rewardQ-learning 1786 2593 523Myopic 1422 2408 653FCFS 1416 2369 429random 1318 2363 483 which in turn requires the minimum number of RBs, i.e.,always one RB is required.We assume that the adversary launches its attack (Q-learning, MinRes, or Random) over slots. The bench-mark of no attack case is also run over slots. Theachieved reward is measured for the last slots. Fornetwork slicing, we ﬁrst consider the RL based scheme.Table I shows the performance of different attacks and thecase of no attack. If there is no attack, i.e., all requestsare real, the total reward and the real portion of it are thesame. All attacks increase the total reward, since there aresome additional fake requests with high reward, but the realportion of the reward is less than the total (all real) rewardachieved under no attack. The random attack does not workwell and only slightly reduces the real reward (from to ). The MinRes attack always generates fake requests withthe minimum required resource. Although this increases theprobability of being selected by the gNodeB for service, theoccupied resource is also minimized. Hence, it signiﬁcantlyreduces the real reward to but it is not as effective as theQ-learning attack, which reduces the real reward to .We also measure the total reward of all fake requestsgenerated under the Q-learning attack, which is . Thetotal reward of served fake requests is −

523 = 2070 , i.e.,most of fake requests are served and occupy some resources.Thus, the Q-learning attack is efﬁcient in terms of the ratiobetween served and generated requests. The total reward askedfor real requests is , while the total reward of served realrequests is only under the Q-learning attack, i.e., only asmall portion of real requests are served under the ﬂoodingattack, showing that the ﬂooding attack is highly successful.This ﬂooding attack can be launched against other networkslicing schemes including the myopic, FCFS, and randomschemes described in Section II. Table II shows that the(reward) performance of all these schemes drop signiﬁcantlyunder the ﬂooding attack. In particular, the FCFS and ran-dom schemes have worse performance than the Q-learningbased scheme regardless there is a ﬂooding attack, or not.One interesting result is that the myopic scheme has worse

TABLE III: The effect of the generation rate of fake requests, r f . r f Total reward Real reward0 1783 17830.1 2077 15870.2 2327 13520.3 2512 10320.4 2605 7000.5 2593 5230.6 2620 440 ≥ performance compared to the Q-learning based scheme whenthere is no attack, but under the ﬂooding attack, the myopicscheme achieves better reward, namely , compared to thereward achieved by Q-learning based scheme. The reasonis that the Q-learning based scheme aims to maximize theexpected total reward, including both current reward and futurereward. When there are fake requests with high rewards, theQ-learning algorithm tends not to allocate resources to realrequests if their reward is not high. On the other hand, thereis no such issue in the myopic scheme since it only considersthe current reward. B. Impact of System Parameters

Now we check the effect of system parameters on theﬂooding attack performance. Table III shows the results whenwe vary the rate of fake requests, r f . The total reward increasesﬁrst with r f due to high reward of fake requests, while thereal reward decreases due to the increasing portion of fakerequests. If r f ≥ . request per slot, the total or real rewarddoes not change, i.e., r f = 0 . is sufﬁcient for the best attack.Table IV shows the results when we vary the number ofRBs, n r . The reward when there is no attack increases ﬁrstwith n r due to more RBs, and then changes within certainrange due to randomness in limited number of user requests.The total reward under attack shows the same trend. The realreward under attack ﬁrst decreases with n r and then stayswithin certain range. To understand the trend better, we assessthe ratio between real reward under attack and the reward whenthere is no attack. By increasing n r , this ratio ﬁrst decreasesas a fake request can attack more RBs, and then stays withincertain range since (i) the number of fake requests is limitedand (ii) there is some randomness in generated fake requests.Table V shows the results when we vary the number ofusers, n u . By increasing n u , the reward when there is no attackincreases due to more requests to be selected for service. Thetotal reward and the real reward under attack show the sametrend. When n u is large, the total reward under attack is lessthan the reward when there is no attack. The reason is thatthe adversary generates fake requests with non-minimum RBswhile with many users it is likely that there are requests withthe same reward and minimum RBs. Then, the total reward byselecting some fake requests may be less than the case of notselecting fake requests. The ratio of real reward over rewardunder no attack increases with n u due to more real requestscompeting with fake requests. ABLE IV: The effect of the number of RBs, n r . n r No attack Flooding attack Ratio (%)Total reward Real reward5 1526 2133 858 56.236 1605 2285 805 50.167 1786 2477 737 41.278 1730 2596 646 37.349 1870 2619 609 32.5710 1902 2644 604 31.7611 1783 2593 523 29.3312 2018 2723 528 26.1613 1964 2677 577 29.3814 1987 2661 506 25.4715 1899 2708 573 30.17

TABLE V: The effect of the number of users, n u . User No attack Flooding attack Ratio (%)Total reward Real reward3 1783 2593 523 29.334 2239 2717 652 29.125 2306 2749 709 30.756 2681 2856 955 35.627 2592 2850 890 34.348 2865 2854 954 33.309 2896 2859 1029 35.5310 3108 2918 1108 35.6520 3540 3065 1480 41.8150 3859 3360 2100 54.42

Table VI shows the results when we vary the SNR for users.By increasing the SNR, all rewards (real or total) increasewith and without ﬂooding attack. In particular, the ratio ofreal reward over the reward under no attack increases due tobetter channels available for users. In this case, it is easierto serve a real request by meeting rate requirements, i.e., itis more challenging for the attack to deny service to a realrequest by consuming resources.Finally, we evaluate the effect of the adversary’s observationerror on available RBs, in terms of false alarm (availableRBs are detected as unavailable) and misdetection (unavailableRBs are detected as available). We ﬁnd that this effect is notsigniﬁcant. Even for signiﬁcant errors up to %, the changeon real reward is at most . %. Hence, the ﬂooding attackis not very sensitive to errors in spectrum sensing. C. Attack Extensions with Enhanced Weight Distribution

The ﬂooding attack that we consider so far generatesfake requests with the largest weight (LW) to maximize theprobability that they are selected by the gNodeB. A defensescheme may detect the largest weight in requests from anadversary and then discard these requests. Against such a

TABLE VI: The effect of the SNR for users.

SNR No attack Flooding attack Ratio (%)Total reward Real rewardlow 1557 2436 286 18.37medium 1783 2593 523 29.33high 1900 2695 625 32.89

TABLE VII: Distribution of weights in network slicing requests. j \ i TABLE VIII: Distribution for high weight. j \ i defense, we extend the ﬂooding attack with the followingschemes to increase randomness of weights. • Uniform weight (UW) : Weights in fake requests areuniformly randomly assigned. • Uniform large reward (ULW) : Weights in fake requestsare uniformly randomly assigned as large values or . • Resource dependent weight (RDW) : The ﬂooding attackrequests should be accepted when remaining resourcesare large, since it is more likely that real requests canbe accepted if no attack. Thus, the adversary can gen-erate more fake requests with large weights if remain-ing resources are large. Otherwise, the adversary cangenerate more fake requests with small weights. Theresource dependent weight distributions are selected suchthat the overall weight distribution remains uniform, i.e., F P i p ij = W , where F is the number of total availableRBs, p ij is the probability of weight is j when the numberof remaining RBs is i , and W is the number of differentweights. One such distribution is shown in Table VII. • Resource dependent large weight (RDLW) : We limit theweights in fake requests to large values or , and chooseweight distributions such that F P i p ij = 0 . for j = 4 or . One such distribution is shown in Table VIII. • Adjusted weight (AW) : The adversary adjusts weightsdynamically based on whether a fake request is selectedor not. If a fake request is selected, the adversary wantsto have another fake request being selected to occupyresources, otherwise there is no such need. Thus, theweight in requests should be increased (bounded by thelargest reward) if a fake request is selected, otherwisethe weight should be decreased (bounded by the smallestreward). We consider three adjustment approaches. – AW 1: increase the weight in fake requests by if afake request is selected or decrease it by otherwise, – AW 2: increase the weight in fake requests to thelargest value if a fake request is selected or decreaseit by otherwise, or – AW 3: increase the weight in fake requests to thelargest value if a fake request is selected or decreaseit by with a probability otherwise. ABLE IX: Attack extensions with different weights in fake requests.

Algorithm Total reward Real rewardLW 2593 523UW 2328 1161ULW 2621 828RDW 2597 1008RDLW 2689 677AW 1 1798 1417AW 2 2389 1008AW 3 2245 1146

Table IX shows results for these attack extensions. First ofall, since the adversary aims to sustain uniform distributionamong different weights in its requests, the probability ofselecting a fake request is reduced and thus the real rewardunder all these attacks is larger than the LW attack case wherewe set the weight in fake requests as the largest value. Underthe ﬂooding attack with UW, the real reward is whilethe real reward under the ﬂooding attack with ULW is .The real reward achieved under the ﬂooding attack with RDWis , which is still much higher than the real reward when we ﬁx the reward as its largest value. The real rewardunder the ﬂooding attack with RDLW is , which is close to when we ﬁx reward as . For the ﬂooding attack with AW,AW 1 yields a real reward , which is high, as it turns outthat most weights for fake requests are selected as . WhenAW 2 is used, the real reward becomes and we ﬁnd thatthe distribution of weights is almost a uniform distribution.When we use AW 3 with the probability set as . , whichalso yields an approximately uniform distribution for rewardin fake requests, the achieved real reward is . Overall, theﬂooding attack can reduce the real reward signiﬁcantly whilekeeping close to uniform weight distribution such that it isdifﬁcult to detect fake requests by checking their weights.V. C ONCLUSION

We presented a ﬂooding attack on 5G RAN slicing, wherean adversary generates fake network slicing requests to con-sume available resources and minimize the reward for realrequests. We developed an RL based attack to generate fakerequests and showed that it is more effective than gen-erating fake requests randomly or with minimum resourcerequirement. Although we focused on attacking an RL basednetwork slicing, we showed that the ﬂooding attack is effectiveagainst other network slicing schemes. We designed weightdistribution schemes for fake requests such that they cannot bedetected by their weights, and showed that the ﬂooding attackusing a close to uniform weight distribution is still effective.Our results indicate that 5G RAN slicing is highly vulnerableto ﬂooding attacks that can signiﬁcantly reduce the reward ofreal requests by starving resources with fake requests.R

EFERENCES[1] X. Foukas, G. Patounas, A. Elmokashﬁ, and M. K. Marina, “Networkslicing in 5G: Survey and challenges,”

IEEE Communications Magazine ,2017.[2] A. Kaloxylos, “A survey and an analysis of network slicing in 5Gnetworks,”

IEEE Communications Standards Magazine , 2018. [3] S. D’Oro, F. Restuccia, A. Talamonti, and T. Melodia, “The sliceis served: Enforcing radio access network slicing in virtualized 5Gsystems,”

IEEE INFOCOM , 2019.[4] T. Erpek, T. O’Shea, Y. E. Sagduyu, Y. Shi, and T. C. Clancy, “Deeplearning for wireless communications,”

Development and Analysis ofDeep Learning Architectures, Springer , 2019.[5] A. Nakao, and P. Du, “Toward in-network deep machine learningfor identifying mobile applications and enabling application speciﬁcnetwork slicing,”

IEICE Transactions on Communications , 2018.[6] A. Thantharate, R. Paropkari, V. Walunj, C. Beard, “DeepSlice: A deeplearning approach towards an efﬁcient and reliable network slicing in5G networks,”

IEEE UEMCON , 2019.[7] R. Li, Z. Zhao, Q. Sun, C.-L. I, C. Yang, X. Chen, M. Zhao, andH. Zhang, “Deep reinforcement learning for resource management innetwork slicing,”

IEEE Access , 2018.[8] J. Koo, M. R. Rahman, V. B. Mendiratta, and A. Walid, “Deepreinforcement learning for network slicing with heterogeneous re-source requirements and time varying trafﬁc dynamics,” arXiv prePrint ,arXiv:1908.03242, 2019.[9] H. Wang, Y. Wu, G. Mina, J. Xu, and P. Tang, “Data-driven dynamicresource scheduling for network slicing: A deep reinforcement learningapproach,”

Information Sciences , 2019.[10] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deepreinforcement learning based framework for power-efﬁcient resourceallocation in cloud RANs,”

IEEE ICC , 2017.[11] Y. Shi, Y. E. Sagduyu, and T. Erpek, “Reinforcement learning fordynamic resource optimization in 5G radio access network slicing,”

IEEE CAMAD , 2020.[12] A. Nassar and Y. Yilmaz, “Deep reinforcement learning for adaptivenetwork slicing in 5G for intelligent vehicular systems and smart cities,” arXiv prePrint , arXiv:2010.09916, 2020.[13] T. Erpek, Y. E. Sagduyu, and Y. Shi, “Deep learning for launching andmitigating wireless jamming attacks,

IEEE Transactions on CognitiveCommunications and Networking , 2019.[14] M. Sadeghi and E. Larsson, “Adversarial attacks on deep-learning basedradio signal classiﬁcation,”

IEEE Communications Letters , Feb. 2019.[15] M. Z. Hameed, A. Gyorgy, and D. Gunduz, “The best defense is agood offense: Adversarial attacks to avoid modulation detection,”

IEEETransactions on Information Forensics and Security , Sept. 2020.[16] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus, “Over-the-air adversarial attacks on deep learning based modulation classiﬁerover wireless channels,”

IEEE CISS , 2020.[17] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus,“Channel-aware adversarial attacks against deep learning-based wirelesssignal classiﬁers,” arXiv prePrint , arXiv:2005.05321, 2020.[18] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus,“Adversarial attacks with multiple antennas against deep learning-basedmodulation classiﬁers,”

IEEE GLOBECOM , 2020.[19] Y. E. Sagduyu, T. Erpek, and Y. Shi, “Adversarial deep learning forover-the-air spectrum poisoning attacks,”

IEEE Transactions on MobileComputing , Feb. 2021.[20] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Over-the-air membershipinference attacks as privacy threats for deep learning-based wirelesssignal classiﬁers,”

ACM WiseML , 2020.[21] K. Davaslioglu and Y. E. Sagduyu, “Trojan attacks on wireless signalclassiﬁcation with adversarial machine learning,”

IEEE DySPAN , 2019.[22] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Generative AdversarialNetwork in the Air: Deep Adversarial Learning for Wireless SignalSpooﬁng,”

IEEE Transactions on Cognitive Communications and Net-working , 2020.[23] Y. Shi, K. Davaslioglu, and Y. E. Sagduyu, “Generative AdversarialNetwork for Wireless Signal Spooﬁng,”

ACM WiseML , 2019.[24] Y. E. Sagduyu, T. Erpek, and Y. Shi, “Adversarial machine learning for5G communications security,” arXiv preprint arXiv:2101.02656, 2021.[25] B. Kim, Y. E. Sagduyu, K. Davaslioglu, T. Erpek, and S. Ulukus, “Howto make 5G communications “invisible”: adversarial machine learningfor wireless privacy,”

Asilomar Conf. Sig., Sys. and Comp. , 2020.[26] Y. Shi, Y. E. Sagduyu, T. Erpek, and M. C. Gursoy, “How to attack anddefend 5G radio access network slicing with reinforcement learning,” arXiv preprint arXiv:2101.05768, 2021.[27] Y. E. Sagduyu, R. Berry and A. Ephremides, “Wireless jamming attacksunder dynamic trafﬁc uncertainty,”