[PDF] A Machine Learning Framework for Resource Allocation Assisted by Cloud Computing

Abstract

Conventionally, the resource allocation is formulated as an optimization problem and solved online with instantaneous scenario information. Since most resource allocation problems are not convex, the optimal solutions are very difficult to be obtained in real time. Lagrangian relaxation or greedy methods are then often employed, which results in performance loss. Therefore, the conventional methods of resource allocation are facing great challenges to meet the ever-increasing QoS requirements of users with scarce radio resource. Assisted by cloud computing, a huge amount of historical data on scenarios can be collected for extracting similarities among scenarios using machine learning. Moreover, optimal or near-optimal solutions of historical scenarios can be searched offline and stored in advance. When the measured data of current scenario arrives, the current scenario is compared with historical scenarios to find the most similar one. Then, the optimal or near-optimal solution in the most similar historical scenario is adopted to allocate the radio resources for the current scenario. To facilitate the application of new design philosophy, a machine learning framework is proposed for resource allocation assisted by cloud computing. An example of beam allocation in multi-user massive multiple-input-multiple-output (MIMO) systems shows that the proposed machine-learning based resource allocation outperforms conventional methods.

Full PDF

11 A Machine Learning Framework for ResourceAllocation Assisted by Cloud Computing

Jun-Bo Wang, Junyuan Wang, Yongpeng Wu, Jin-Yuan Wang,Huiling Zhu, Min Lin, Jiangzhou Wang

Abstract

Conventionally, the resource allocation is formulated as an optimization problem and solved online withinstantaneous scenario information. Since most resource allocation problems are not convex, the optimal solutionsare very difﬁcult to be obtained in real time. Lagrangian relaxation or greedy methods are then often employed,which results in performance loss. Therefore, the conventional methods of resource allocation are facing greatchallenges to meet the ever-increasing QoS requirements of users with scarce radio resource. Assisted by cloudcomputing, a huge amount of historical data on scenarios can be collected for extracting similarities among scenariosusing machine learning. Moreover, optimal or near-optimal solutions of historical scenarios can be searched ofﬂineand stored in advance. When the measured data of current scenario arrives, the current scenario is compared withhistorical scenarios to ﬁnd the most similar one. Then, the optimal or near-optimal solution in the most similarhistorical scenario is adopted to allocate the radio resources for the current scenario. To facilitate the applicationof new design philosophy, a machine learning framework is proposed for resource allocation assisted by cloudcomputing. An example of beam allocation in multi-user massive multiple-input-multiple-output (MIMO) systemsshows that the proposed machine-learning based resource allocation outperforms conventional methods.

Index Terms

Resource allocation, machine learning, cloud computing, k -nearest neighbour ( k -NN), beam allocation algo-rithm, massive MIMO. Jun-Bo Wang is with National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (email:[email protected]).Junyuan Wang, Huiling Zhu and Jiangzhou Wang are with the School of Engineering and Digital Arts, University of Kent, Canterbury,Kent, CT2 7NT, United Kingdom (email: { jw712,h.zhu,j.z.wang } @kent.ac.uk).Yongpeng Wu is with the Department of Electronic Engineering, Shanghai Jiao Tong University, China (email: [email protected])Min Lin and Jin-Yuan Wang are with College of Telecommunications and Information Engineering, Nanjing University of Posts andTelecommunications, Nanjing 210003, China (email: { linmin,jywang } @njupt.edu.cn).Corresponding author: Jun-Bo Wang Corresponding email: [email protected] a r X i v : . [ c s . L G ] D ec I. I

NTRODUCTION

With the rapid development of electronic devices and mobile computing techniques, worldwide societaltrends have demonstrated unprecedented changes in the way wireless communications are used. It ispredicted that the monthly trafﬁc of smartphones around the world will be about 50 exabytes in 2021 [1],which is about 12 times of that in 2016. Obviously, wireless communications have become indispensable toour society and involved many aspects of our life. Many familiar scenarios such as ultra-dense residentialareas and ofﬁce towers, subways, highways, and high-speed railways challenge the future mobile networksin terms of ultra-high trafﬁc volume density, ultra-high connection density, or ultra-high mobility. Dueto its ability to guarantee the users’ Quality of Service (QoS) and optimize the usage of facilities tomaximum operators’ revenue, how to allocate radio resources more efﬁciently is always one hot topic forfuture wireless communications [2].In practical networks, the overall performance depends on how to exploit the ﬂuctuation of wirelesschannels and trafﬁc loads to efﬁciently and dynamically manage the hyper-dimensional radio resources(such as frequency bands, time slots, orthogonal codes, transmit power, and transmit-receive beams) andfairly to support users’ QoS requirements. On one hand, radio resources are inherently scarce, since allusers competitively share the common electromagnetic spectrum and wireless infrastructures. On the otherhand, wireless services have been becoming increasingly sophisticated and various, each of which has awide range of QoS requirements. Efﬁcient and robust resource allocation algorithms are essential for thesuccess of future mobile networks. Conventionally, the resource allocation problems are often formulatedmathematically as optimization problems. After collecting instantaneous channel state information (CSI)and QoS requirements of users, the formulated optimization problems are solved online. That is, thesolutions must be obtained shortly since wireless channels and trafﬁc loads are varying quickly. However,most of the optimization problems are not convex [3], which indicates that the optimal solutions are oftenvery difﬁcult to be obtained, especially in the scenarios with a lot of users and diverse radio resources.

Therefore, conventional Lagrangian relaxation or greedy methods are often employed to ﬁnd solutionsonline. Inevitably, the online solutions of resource allocation will result in performance loss. With theincreasing of users’ QoS requirements, conventional methods are facing great challenges in designingmore sophisticated resource allocation schemes to further improve system performance with scarce radioresource, which motivates the exploration of novel design philosophy for resource allocation.In March 2016, a ﬁve-game Go match was held between 18-time world champion Lee Sedol andAlphaGo, a computer Go program developed by Google DeepMind [4]. From the views of conventionalcomputing theory, Go had previously been regarded as an extremely difﬁcult problem that was expected tobe out of reach for the state-of-the-art technologies. Surprisingly, AlphaGo found its moves based on theknowledge previously “learned” from historical match records and won all but the fourth game. Inspiredby the victory of AlphaGo, how to apply machine learning techniques to address the challenges in futurecommunications attracts great attention and has been discussed widely [5], [6].In practical wireless communications, the radio resources are dynamically allocated according to theinstantaneous information including CSI and QoS requirements of users. Inexpensive cloud storage makesit very easy to save the information as data on historical scenarios that previously we would haveignored and trashed. Recent investigations have found that these data convey a lot of similarities betweencurrent and historical scenarios on user requirements and wireless propagation environments [7]. Using thesimilarities among scenarios, the solutions of resource allocation in historical scenarios can be exploited toimprove the resource allocation of current scenario. More speciﬁcally, the solutions of resource allocationin historical scenarios can be searched ofﬂine and stored in advance. When the measured data of currentscenario arrives, it is not necessary to use conventional Lagrangian relaxation or greed methods to solve theresource allocation problem online. Instead, we only need to compare the current scenario with historicalscenarios and ﬁnd the most similar one. Then, we use the solution of the most similar historical scenarioto allocate the radio resources for the current scenario. Interestingly, the ofﬂine characteristic makes itpossible to use advanced cloud computing techniques to ﬁnd optimal or near-optimal solutions of resource

Cloud

Base StationBase Station

Base Station

Fig. 1. Wireless Communications Assisted by Cloud Computing allocation for historical scenarios, which can improve the performance of resource allocation accordingly.II. M

ATHEMATICAL M ODELING OF R ESOURCE A LLOCATION

As illustrated in Fig. 1, the architecture of wireless communications assisted by cloud computingconsists of three main components, (i) conﬁgurable computing resources clustered as a cloud with highcomputational and storage capabilities, (ii) base station (BS) with wireless access functions, and (iii)backhaul links which deliver the measured data of real scenarios from the BS to the cloud and deploythe machine learning based resource allocation schemes at the BS. More details will be discussed in thenext section. In general, the resource allocation which is preformed at the BS can be formulated as amathematical optimization problem [3], given by miniminze x ∈ S f ( x , a ) (1) subject to g i ( x , a ) ≤ i = 1 , · · · , mh i ( x , a ) = 0 i = 1 , · · · , p where x is the variable vector of the problem, f ( · , · ) is the objective function to be minimized over thevector x , a is the parameter vector that speciﬁes the problem instance, { g i } mi =1 and { h i } mi =1 are calledinequality and equality constraint functions, respectively, and S is called a constraint set. By convention,the standard form deﬁnes a minimization problem. A maximization problem can be treated by negatingthe objective function.If a resource allocation problem is formulated as the form (1), all elements in the vector x are referredas variables which describe the allocated amount or conﬁguration of radio resources, such as the transmitpower level, and the assigned subcarrier index. All elements in the vector a are the system parameters orwireless propagation parameters, such as the bandwidth, the subcarrier number, and the background noiselevel. { g i } mi =1 and { h i } mi =1 are used to deﬁne the speciﬁc scenario and the limitations on the resourceallocation, such as the available amount of radio resources, users QoS requirements, and the impactsfrom all kinds of interferences and noises. The objective function describes the characteristics of the bestpossible solution and reveals the design objective, i.e., the key performance metrics for resource allocation.For a speciﬁed scenario described by a , the optimal solution of resource allocation x ∗ is the vector thatobtains the best value of objecitve function among all possible vectors and satisﬁes all constraints.III. A M ACHINE L EARNING F RAMEWORK

For existing wireless systems assisted by cloud computing, a huge amount of data on historical scenariosmay have been collected and stored at the cloud. The strong computing capability of the cloud is exploitedto search the optimal or near-optimal solutions for these historical scenarios. By classifying these solutions,the similarities hidden in these historical scenarios are extracted as a machine learning based resourceallocation scheme. The machine learning based resource allocation scheme will be forwarded to guide BShow to allocate radio resource more efﬁciently. When a BS is deployed in a new area, there is usually noavailable data about historical scenarios. In this case, the initially historical data can be generated froman abstract mathematical model with realistic BS locations, accurate building footprints, presumptive user

F(cid:0)(cid:1)(cid:2)(cid:3)(cid:4)(cid:0)E(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)(cid:6)(cid:10)(cid:11)(cid:12) P(cid:4)(cid:0)(cid:13)(cid:14)(cid:15)(cid:2)(cid:14)(cid:16)(cid:0)M(cid:11)(cid:17)(cid:18)(cid:19) S(cid:20)(cid:21)(cid:3)(cid:2)(cid:14)(cid:20)(cid:22) (cid:20)(cid:23) (cid:2)(cid:24)(cid:0)(cid:25)(cid:7)(cid:18)(cid:17)(cid:10)(cid:9)(cid:6)(cid:18)(cid:17) (cid:26)(cid:19)(cid:8)(cid:27)(cid:27)N(cid:0)(cid:28) (cid:29)(cid:1)(cid:2)(cid:1) (cid:30)(cid:31)e i!tiveM(cid:11)(cid:17)(cid:18)(cid:19)" D(cid:8)(cid:6)(cid:8)(cid:27)(cid:18)(cid:6)

Fig. 2. A Machine Learning Framework of Resource Allocation. distribution and requirements, and wireless propagation models. When the new BS emerges into service,the measured data of real-time scenarios will be collected from practical systems, and later used ashistorical data for learning.The proposed machine learning framework is shown in Fig. 2. At the cloud, a huge amount of historicaldata on scenarios are stored using the cloud storage. The historical data has a lot of attributes, includingthe user number, the CSI of users, international mobile subscriber identiﬁcation numbers (IMSIs) ofusers, and so on. Some attributes, such as IMSIs of users, may be irrelevant for the speciﬁc resourceallocation, i.e., these irrelevant attributes are not included in the parameter vector a in the optimization problem (1). Learning from a large number of raw data with many attributes generally requires a largeamount of memory and computation power, and it may inﬂuence the learning accuracy [8]. Therefore, theirrelevant attributes can be removed without incurring much loss of the data quality. In order to reducethe dimensionality of the data and enables the learning process to operate faster and more effectively,feature selection is carried out to identify and remove as many irrelevant attributes as possible, which willbe discussed in Section IV-A.Through feature selection, some key attributes are selected from the historical data and presented asa feature vector. However, there may exist some operation faults in the data measurement, transmission,and storage, which results in the abnormal, incomplete or duplicate values in feature vectors. Therefore,necessary preprocessing is required to delete erroneous or duplicate feature vectors. Then, all remainfeature vectors are collected to form a very large dataset. Further, all feature vectors in dataset are splitrandomly into a training and a test set. Normally, 70-90% of the feature vectors is assigned into thetraining set.With the training set, a supervised learning algorithm in machine learning is adopted to ﬁnd thesimilarities hidden in historical data. By doing so, a predictive model can be built which will be usedto make resource allocation decision for future unexpected scenario. More speciﬁcally, with the aid ofcloud computing, advanced computing techniques can be used to search the solutions for the optimizationproblem (1) with more computational time. Compared with conventional Lagrangian relaxation or greedymethods, the performance of searched solutions can be improved signiﬁcantly. Therefore, a high perfor-mance solution of resource allocation can be searched ofﬂine and associated with each training featurevector, which will be discussed in Section IV-B. All training feature vectors with the same solutions areclassiﬁed into one class and each class is associated with its own solution. The resource allocation problemis now transformed into a multiclass classiﬁcation problem, which will be discussed in Section IV-C. Inorder to solve the multiclass classiﬁcation problem, a predictive model will be built with two functions.The ﬁrst is to predict the class for future scenario, which can be mathematically described as a classiﬁer l = Classif ier ( F T ) . F T is the input feature vector extracted from scenario, and l is the output class indexshowing that the scenario belongs to the l th class. Then, the associated solution of the l th class is selectedto allocate radio resources for the scenario depicted by F T . Before deploying the model, the recentlybuilt predictive model is evaluated by the test set and further optimized until the evaluation results aresatisfactory.Using the backhaul links, the built predictive model and the associated solutions of all classes will betransmitted to BS. At the BS, the measured data of a real-time scenario is ﬁrst used to form its newfeature vector. Then the new feature vector will be input into the the built predictive model to allocateradio resource. Meanwhile, the new feature vector will be collected and stored temporarily at BS andforwarded to the cloud later for updating the dataset, which is very important for tracing the evolutionsof real scenarios, including user behaviors and wireless propagation environments.Although a lot of computing resources are consumed to build a predictive model, the computingwork can be carried out ofﬂine during the off-peak time. Moreover, the dataset updating and the modeldeployment can also be accomplished during the off-peak time. Therefore, the cloud can be shared withmultiple BSs and the computing tasks can be ﬂexibly scheduled to make full use of the available computingresources. IV. A PPLICATION OF S UPERVISED L EARNING TO R ESOURCE A LLOCATION

In the proposed machine learning framework, a machine learning algorithm is adopted to build apredictive model. General speaking, machine learning algorithm is usually categorized as either supervisedor unsupervised [7]. In the supervised learning, the goal is to learn from training data which are labeledwith nonnegative integers or classes , in order to later predict the correct response when dealing with newdata. The supervised approach is indeed similar to human learning under the supervision of a teacher.The teacher provides good training examples for the student, and the student then derives general rulesfrom these speciﬁc examples. In contrast to supervised learning, the data for unsupervised learning have no labels and the goal instead is to organize the data and ﬁnd hidden structures in unlabeled data. Mostmachine learning algorithms are supervised. In the following, we will discuss how to apply the supervisedlearning to solve the resource allocation problem.

A. Feature Selection

In machine learning, feature selection, also known as attribute selection, is the process of selectinga subset of relevant attributes in historical data to form feature vector for building predictive models.The selection of an appropriate feature vector is critical due to the phenomenon known as “the curse ofdimensionality” [9]. That is, each dimension that is added to the feature vector requires exponentiallyincreasing data in the training set, which usually results in practical signiﬁcant performance degradation.Therefore, it is necessary to ﬁnd a low dimension of feature vectors that captures the essence of resourceallocation in practical scenarios.In order to reduce the dimensionality of feature vectors, only valuable information for the resource allo-cation can be selected as features. After modeling the resource allocation as the optimization problem (1),all valuable information is included in the parameter vector a . Observing the elements of a , it can befound that they can be further divided into two categories: time-variant (dynamic) or time-invariant (static).Some elements are constants and thus labeled as time-invariant parameters, such as subcarrier number,maximum transmit power, and antenna number. Other elements that change quickly and are requiredto be measured and feedback all the time for making decisions of the resource allocation are labeledas time-variant parameters, such as user number, CSI of all users, and interference levels. As the time-invariant parameters keep unchanged, in order to minimize the dimension of the feature vectors, only thetime-variant parameters can be considered to be features. Moreover, some time-variant parameters cannotbe selected as features since it may be redundant in the presence of another relevant feature with whichit is strongly correlated. In short, an individual feature vector speciﬁes a unique scenario for resourceallocation. However, it should be noted that the feature selection is a process of trial and error, which can be time consuming and costly especially with very large datasets. B. Solutions of Optimization Problems

To facilitate the application of supervised learning, the solution of resource allocation problem speciﬁedby each training feature vector should be obtained in advance. Then, each training feature vector isassociated with its solution. According to the associated solutions, all feature vectors are labeled intomultiple classes. More speciﬁcally, all training feature vectors with the same solution are placed withthe same class label, indexed by a nonnegative integer. In other words, each class is associated with itsunique solution. The class label information of all training feature vectors will be used to build a predictivemodel. In practice, the measured data of real-time scenario is selected as a new feature vector. Then thepredictive model will predict the class for the new feature vector, and output the associated solution ofthe predicted class, i.e., how to allocate the radio resource for the real-time scenario. Obviously, if toomany training feature vectors are associated with low performance solutions, the built predictive modelcannot supply high performance solutions for practical resource allocation. Therefore, ﬁnding optimal ornear-optimal solutions of all training feature vectors is crucial for building a high performance predictivemodel.In the resource allocation problem (1), all elements in the vector x are used to describe how to allocatethe radio resources. Mathematically, the allocation of many radio resources can be described by integervariables, such as subcarriers, timeslots, modulation and coding schemes. Intuitively, the transmit powerlevel can be adjusted arbitrarily between the maximum transmit power and zero. It seems that that only acontinuous variable can be used to describe the transmit power allocation. However, in order to simplifythe system complexity, the transmitter in practical systems are usually allowed to transmit signals withonly a few preﬁxed power levels. Therefore, most practical resource allocation issues can be modeledas an integer optimization problem. When the number of integer variables in an integer optimizationproblem is very small, the optimal solution can be found by exhaustive search. However, if there are f Class 1

Class 2

Class 3 f Fig. 3. Multiclass classiﬁcation many integer variables, ﬁnding an optimal solution of resource allocation is extremely computationallycomplex because they are known to be non-deterministic polynomial-time hard (NP-hard) [10]. In this case,it is more feasible to search the near-optimal solutions for all training feature vectors. Moreover, the ofﬂinecharacteristic of building model and the strong cloud computing and storage capabilities make it possibleto spend more computation time using the metaheuristics to search near-optimal solutions. Some famousmetaheuristics algorithms [11], such as particle swam optimization (PSO) and ant colony optimization(ACO) have been applied to the solution of many classic combination optimization problems. For mostof these applications, the results show that these metaheuristics algorithms outperform other algorithms,including conventional Lagrangian relaxation or greed based algorithms.

C. Multiclass Classiﬁcation Problem and Classiﬁer

When the class label information is ready for all training feature vectors, it starts to look for thesimilarities hidden in labeled feature vectors. Mathematically, the set of all possible feature vectorsconstitutes a feature space. As a special case, if the order of the feature vectors is 2, the feature space is atwo-dimensional space, as shown Fig. 3. Note that f and f are the ﬁrst and second elements of featurevectors, respectively. When all labeled feature vectors are shown in the feature space, it can be observed (cid:971) K=1 K=3 Class 1Class 2

Training Data

DistanceNew Feature

Vector

Fig. 4. k -NN algorithm that the feature vectors with the same class label are often distributed very closely. Accordingly, the featurespace can be divided into several subspaces and most feature vectors with the same class label are locatedwithin the same subspace. Then, the hidden similarities can be exploited by building a classiﬁer , whichpredicts the class of new feature vector by determining which subspace it is located in. In supervisedlearning, such learning process is often called as a multiclass classiﬁcation problem [12]. So far, manymachine learning algorithms have been used for designing multiclass classiﬁers. Selecting a machinelearning algorithm is also a process of trial and error. It is a trade-off between speciﬁc characteristics ofthe algorithms, such as computational complexity, speed of learning, memory usage, predictive accuracyon new feature vectors, and interpretability. Therefore, the design of multiclass classiﬁer is an essentialtask.Here, we brieﬂy introduce the k -NN algorithm which is known to be very simple to understand butworks incredibly well in practice. As shown in Fig. 4, whenever a new feature vector arrives, the k -NNalgorithm picks up totally k nearest neighbors of the new feature vector from the training set. Then, thenew feature vector is judged to belong to the most common class among its k nearest neighbors. If k = 1 ,the new feature vector is simply categorized to the class of its nearest neighbor. V. E

XAMPLE : B

EAM A LLOCATION IN M ULTIUSER M ASSIVE

MIMOIn this section, the beam allocation problem in a single-cell multiuser massive MIMO system consideredin [13] will be taken as an example to demonstrate the efﬁciency of our proposed machine learningframework of resource allocation.In the single-cell system, it is assumed that the BS is located at the center of the circular cell and K users are uniformly distributed within the cell with unit radius, and each user is equipped with a singleantenna. A massive mumber of N (cid:29) K ﬁxed beams are formed by deploying the Butler network [14]with a linear array of N identical isotropic antenna elements at the BS. In such a ﬁxed-beam system, auser will be served by a beam allocated to it as shown in Fig. 5 where each user is served by the beam inthe same color and ( ρ k , θ k ) denotes the polar coordinate of user k . To serve multiple users simultaneously,the key problem is: how to efﬁciently allocate beams to users to maximize the sum rate?As the number of beams N is much larger than the number of users K and each user is served byone beam, only some of beams will be active for serving users. Therefore, we ﬁrst need to decide whichbeams are active. This can be solved by applying our machine learning framework. Speciﬁcally, the activebeam solution serves as the output of the predictive model. By assuming a line-of-sight (LoS) channel,as the beam gains of K users from N beams are determined by the K users’ locations, the user layout u = [( ρ , θ ) , ( ρ , θ ) , · · · , ( ρ K , θ K )] should serve as the input data, which contains both the radial distanceand phase information. Since the beam gains from various beams for a user signiﬁcantly vary with itsphase as shown in Fig. 5, the achievable sum rate with a beam allocation solution is mainly determinedby the phase information of K users. Therefore, the feature vector F u of a user layout data u is selectedas F T = [cos θ (1) , cos θ (2) , · · · , cos θ ( K ) ] , (2)where θ (1) ≤ θ (2) ≤ · · · ≤ θ ( K ) is the order statistics obtained by arranging θ , θ , · · · , θ K .Before performing resource allocation, we ﬁrst need to train the predictive model by learning from a ¼ User µ k k ½ k Fig. 5. Illustration of beam allocation in multiuser massive MIMO systems. “x” represents a user. A beam is allocated to the user in thesame color. ( ρ k , θ k ) denotes the polar coordinate of user k . large amount of training user layout data, which can be generated by computer according to its distribution.For each training user layout, its feature vector formed according to (2) is associated with its active beamsolution which can be obtained by employing ofﬂine beam allocation algorithms. Thanks to the strongcloud computing capability, optimal exhaustive search or near-optimal metaheuristics algorithms can beadopted as mentioned in Section IV-B. In this section, exhaustive search is applied for demonstration byassuming a smaller number of users and beams. After associating each feature vector in the training setwith its active beam solution, all the training feature vectors are naturally classiﬁed into a variety of classesaccording to their active beam solutions. Speciﬁcally, the feature vectors sharing the same active beamsolution are in the same class. A predictive model of active beam solution can be then built by applyinga simple k -NN algorithm , which can be then evaluated and optimized to guarantee its performance asFig. 2 shows. For instance, it can be improved by adding more training data. The effect of the size oftraining set will be discussed by presenting Fig. 6(b).The built predictive model is then deployed at the BS for beam allocation. For a new user layout u i ,by forming its feature vector F u i and deﬁning the distance from its feature vector F u i to a stored trainingfeature vector F T j , d u i ,T j , as d u i ,T j = (cid:107) F u i − F T j (cid:107) , (3) The k -NN algorithm is employed in this section for illustration thanks to its simplicity. -30 -20 -10 0 10 20 30 40 50 Transmit SNR (dB)05

35 Exhaustive SearchProposed

LBA A v e r age S u m R a t e ( b i t/ s / H z ) (a) N = 16 , K = 8 , training data. Size of Training SetExhaustive SearchProposed

LBA A v e r age S u m R a t e ( b i t/ s / H z ) K = 4 ;

10 10 (b) N = 8 , transmit SNR=20dB.Fig. 6. Average sum rate with our proposed machine learning framework, optimal exhaustive search and LBA algorithm proposed in [13].For the employed k -NN algorithm, k = 1 . k nearest neighbor feature vectors with k smallest distances are picked. According to the k -NN algorithm,the most common class among these k neighbors is chosen as the predictive class of the input user layout u i and the predictive model outputs the associated active beam solution of its predicted class. Based onthe active beam information, each active beam is allocated to its best user with the highest received signal-to-interference-plus-noise ratio (SINR) by assuming equal power allocation among users. In addition, thenew feature vectors F u i will be collected to further update the dataset and trace the evolution of userlayout.Fig. 6 presents the average sum rate with our proposed machine learning framework of beam allocationversus the transmit signal-to-noise ratio (SNR) and size of training set. For comparison, the average sumrate with both optimal exhaustive search and low-complexity beam allocation (LBA) proposed in [13]are also plotted. It can be seen from Fig. 6(b) that as the number of training data increases, the averagesum rate achieved by our proposed machine learning framework increases and gradually approaches thatwith the optimal exhaustive search. It can also be observed from Fig. 6 that with a larger training set,our algorithm outperforms the LBA algorithm proposed in [13], indicating that our proposed machine learning framework of resource allocation outperforms conventional techniques.Note that for the aforementioned k -NN algorithm, the distances between new data and existing trainingdata are calculated in real time. As a result, with a large number of training data, the computationcomplexity would become very high in practical systems. It is therefore important to design a low-complexity multiclass classiﬁer, which will be discussed in Section VI-A.VI. R ESEARCH C HALLENGES AND O PEN I SSUES

Machine learning offers a plethora of opportunities for the research in resource allocation for futurewireless communications. There are many open issues still not being studied, and need to be furtherexplored. This section outlines some of the most important ones from our viewpoints.

A. Low-Complexity Classiﬁer

More advanced techniques are required to design low-complexity multiclass classiﬁers. One of thepromising techniques is to transform the multiclass classiﬁcation problem into a set of binary classiﬁcationproblems that are efﬁciently solved using binary classiﬁers. So far, the support vector machine (SVM)has been regarded as one of the most robust and successful algorithms to design low-complexity binaryclassiﬁers, which determines the class of a new feature vector by using linear boundaries (hyperplanes)in high dimensional spaces. More speciﬁcally, the two classes are divided by only a few hyperplanes.Accordingly, the class is determined based on which sides of hyperplanes the new feature vector fallsinto. Compared with the k -NN algorithm, the complexity of SVM-based binary classiﬁers is very low. Forthe aforementioned beam allocation example, the total number of active beam solutions is N . In otherwords, there exist at most N classes, which indicates the complexity is O (2 N ) for determining the classof scenario. Meanwhile, the complexity of exhaustive search is O ( N K ) . Obviously, our proposed machinelearning framework of resource allocation can approach the optimal performance of exhaustive search witha low complexity. It is worth mentioning that several typical scenarios have been deﬁned with differentQoS requirements for future ﬁfth-generation (5G) communications [15]. For example, “Great service in a crowd” scenario focuses on providing reasonable experiences even in crowded areas including stadiums,concerts, and shopping malls. For each typical scenario, the hidden common features on user behaviors andwireless propagation environments may reduce the number of classes, which can be exploited to furtherreduce the complexity of classiﬁers. Recently the deep learning has shown the signiﬁcant advantages inexploiting the hidden common features [16], [17]. B. Multi-BS Cooperation

In practical networks, there may exist some users located at the edge of the BS coverage. If theedge user is served by only one BS, the signal quality may be very poor due to the long transmissiondistance. The cooperative transmission among multiple nearby BSs have been shown to be able to improvethe edge user’s performance signiﬁcantly [18]. In this case, the radio resources at multiple BSs shouldbe allocated jointly. Compared with a single BS scenario, the resource allocation problem of multi-BScooperative transmissions requires more information among multiple BSs. Accordingly, more attributes inhistorical data will be selected into feature vectors, which makes the learning process more complicatedand challenging. How to use historical data to improve the resource allocation with multi-BS cooperativetransmissions is very challenging and needs to be studied.

C. Fast Evolution of Scenarios

In many real scenarios, the user behaviors and wireless environments are time evolving essentially [19].That is, the characteristic hidden in historical scenarios is also dynamic. In most cases, such evolutions aretoo slow and gentle to be noticed. Such slow evolutions can be traced easily by constantly collecting dataand periodically updating the dataset for learning. However, in some special situations, the evolutions maybe very sudden and signiﬁcant. For example, an emergency maintenance is carried out for a very busyroad which changes the distribution of user locations and mobility characteristics greatly; a high buildingis demolished by blasting which changes the propagation environments signiﬁcantly. Since the predictivemodel is built with outdated historical data, such fast evolutions may result in signiﬁcant performance loss of resource allocation. In machine learning, this issue can be addressed by updating the predictive modelwhenever a new data is available. However, since the cloud computing is shared by many applications,the new data can only be stored temporarily at BSs and forwarded to update dataset later. How to dealwith the fast evolutions of scenarios in resource allocation is a challenging topic in future research.VII. C ONCLUSION

In future wireless communications, the conventional methods of resource allocation are facing greatchallenges to meet the ever-increasing QoS requirements of users with scarce radio resource. Inspired bythe victory of AlphaGo, this paper proposed a machine learning framework for resource allocation anddiscussed how to apply the supervised learning to extract the similarities hidden in a great amount ofhistorical data on scenarios. By exploiting the extracted similarities, the optimal or near-optimal solutionof the most similar historical scenario is adopted to allocate the radio resources for the current scenario.An example of beam allocation in multi-user massive MIMO systems was then presented to verify thatour proposed machine-learning based resource allocation performs better than conventional methods. In anutshell, machine-learning based resource allocation is an exciting area for future wireless communicationsassisted by cloud computing. R

EFERENCES

IEEE Communications Surveys Tutorials , vol. 19, no. 1, pp. 239–284, Firstquarter 2017.[3] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,”

IEEE Journal on SelectedAreas in Communications , vol. 24, no. 8, pp. 1426–1438, Aug 2006.[4] https://deepmind.com/.[5] Z. M. Fadlullah and F. Tang and B. Mao and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “State-of-the-Art Deep Learning:Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Trafﬁc Control Systems,”

IEEE Communications Surveys &Tutorials , vol. 19, no. 4, pp. 2432–2455, Fourthquarter 2017.[6] B. Mao and Z. M. Fadlullah and F. Tang and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “Routing or Computing?The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning,”

IEEE Transactions onComputers , vol. 66, no. 11, pp. 1946–1960, Nov 2017.[7] S. Bi, R. Zhang, Z. Ding, and S. Cui, “Wireless communications in the era of big data,”

IEEE Communications Magazine [10] C. H. Papadimitriou and K. Steiglitz, Combinatorial optimization: Algorithms and complexity . Courier Corporation, 1982.[11] X.-S. Yang,

Nature-Inspired Metaheuristic Algorithms . Luniver Press, 2008.[12] E. Alpaydin,

Introduction to Machine Learning . MIT Press, 2014. [Online]. Available: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6917138[13] J. Wang, H. Zhu, L. Dai, N. J. Gomes, and J. Wang, “Low-complexity beam allocation for switched-beam based multiuser massiveMIMO systems,”

IEEE Transactions on Wireless Communications , vol. 15, no. 12, pp. 8236–8248, Dec 2016.[14] J. Buter and R. Lowe, “Beam-forming matrix simpliﬁes design of electrically scanned antennas,”

Electronic Design , April 1962.[15] A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia, O. Queseth, M. Schellmann, H. Schotten, H. Taoka, H. Tullberg,M. A. Uusitalo, B. Timus, and M. Fallgren, “Scenarios for 5G mobile and wireless communications: The vision of the metis project,”

IEEE Communications Magazine , vol. 52, no. 5, pp. 26–35, May 2014.[16] N. Kato and Z. M. Fadlullah and B. Mao and F. Tang and O. Akashi and T. Inoue and K. Mizutani, “The Deep Learning Vision forHeterogeneous Network Trafﬁc Control: Proposal, Challenges, and Future Perspective,”

IEEE Wireless Communications , vol. 24, 2017.[17] F. Tang and B. Mao and Z. M. Fadlullah and N. Kato and O. Akashi and T. Inoue and K. Mizutani, “On Removing Routing Protocolfrom Future Wireless Networks: A Real-time Deep Learning Approach for Intelligent Trafﬁc Control,”

IEEE Wireless Communications ,vol. PP, no. 99, pp. 1–7, 2017.[18] X. Chen, H. H. Chen, and W. Meng, “Cooperative communications for cognitive radio networks: From theory to applications,”

IEEECommunications Surveys Tutorials ∼∼