[PDF] Context-Aware Mobility Management in HetNets: A Reinforcement Learning Approach

Abstract

The use of small cell deployments in heterogeneous network (HetNet) environments is expected to be a key feature of 4G networks and beyond, and essential for providing higher user throughput and cell-edge coverage. However, due to different coverage sizes of macro and pico base stations (BSs), such a paradigm shift introduces additional requirements and challenges in dense networks. Among these challenges is the handover performance of user equipment (UEs), which will be impacted especially when high velocity UEs traverse picocells. In this paper, we propose a coordination-based and context-aware mobility management (MM) procedure for small cell networks using tools from reinforcement learning. Here, macro and pico BSs jointly learn their long-term traffic loads and optimal cell range expansion, and schedule their UEs based on their velocities and historical rates (exchanged among tiers). The proposed approach is shown to not only outperform the classical MM in terms of UE throughput, but also to enable better fairness. In average, a gain of up to 80\% is achieved for UE throughput, while the handover failure probability is reduced up to a factor of three by the proposed learning based MM approaches.

Full PDF

aa r X i v : . [ c s . N I] M a y Context-Aware Mobility Management in HetNets: AReinforcement Learning Approach

Meryem Simsek ∗ , Mehdi Bennis † , and ˙Ismail Güvenç ‡∗ Vodafone Chair Mobile Communications Systems Technische Universität Dresden, Germany † Centre for Wireless Communications, University of Oulu, Finland ‡ Department of Electrical & Computer Engineering, Florida International University, USAEmail: [email protected], [email protected].ﬁ, and iguvenc@ﬁu.edu

Abstract —The use of small cell deployments in heterogeneousnetwork (HetNet) environments is expected to be a key featureof 4G networks and beyond, and essential for providing higheruser throughput and cell-edge coverage. However, due to differentcoverage sizes of macro and pico base stations (BSs), such aparadigm shift introduces additional requirements and challengesin dense networks. Among these challenges is the handoverperformance of user equipment (UEs), which will be impactedespecially when high velocity UEs traverse picocells. In this paper,we propose a coordination-based and context-aware mobilitymanagement (MM) procedure for small cell networks usingtools from reinforcement learning. Here, macro and pico BSsjointly learn their long-term trafﬁc loads and optimal cell rangeexpansion, and schedule their UEs based on their velocities andhistorical rates (exchanged among tiers). The proposed approachis shown to not only outperform the classical MM in terms ofUE throughput, but also to enable better fairness. In average,a gain of up to 80% is achieved for UE throughput, while thehandover failure probability is reduced up to a factor of threeby the proposed learning based MM approaches.

Index Terms —Cell range expansion, HetNets, load balancing,mobility management, reinforcement learning, context-awarescheduling.

I. I

NTRODUCTION

The deployment of Long Term Evolution (LTE) heteroge-neous networks (HetNets) is a promising approach to meet theever-increasing wireless broadband capacity challenge [1], [2].However, deploying HetNets entails a number of challengesin terms of capacity, coverage, mobility management (MM),and mobility load balancing (MLB) across multiple networktiers [3]. Mobility management is essential to ensure a con-tinuous connectivity to mobile user equipment (UEs) whilemaintaining quality of service (QoS).The mobility framework for LTE was originally developedand analyzed by the rd generation partnership project (3GPP)for macro-only networks, and was therefore not explicitlyoptimized for HetNets. In LTE Rel. 11, mobility enhance-ments in HetNets have been investigated through a dedicatedstudy item [3]. 3GPP has deﬁned key performance indicators(KPIs) for mobility measurements, i.e., the handover failure(HOF) due to a degraded signal-to-interference-plus-noise-ratio (SINR), the radio link failure (RLF), as well as the This research was supported in part by the SHARING project under theFinland grant 128010 and by the U.S. National Science Foundation under thegrants CNS-1406968 and AST-1443999. probability of unnecessary handovers, typically referred to asping-pong (PP) events.Poor MM approaches may increase the HOFs, RLFs, andPPs, and result in unbalanced load among cells. This entailsa low resource utilization efﬁciency and hence deteriorationof the user experience. In order to solve this problem, whileminimizing PPs, mobility parameters in each cell need to becarefully and dynamically optimized according to cell trafﬁcloads. It is essential to optimize handover parameters suchas time to trigger (TTT), range expansion bias (REB), andhysteresis margin in order to answer the question: “ when tohandover which

UE to which cell?”Mobility management techniques for HetNets have beenrecently investigated in the literature, e.g., in [4]–[7]. In [4], theauthors evaluate the effect of different combinations of MMparameter settings for HetNets. The main result is that mobilityperformance strongly depends on the cell size and UE speed.The simulations in [4] consider that all UEs have the samevelocity in each simulation setup. In [5], the authors evaluatethe mobility performance of HetNets considering almost blanksubframes in the presence of cell range expansion and proposea mobility based intercell interference coordination (ICIC)scheme. Hereby, picocells conﬁgure coordinated resources bymuting certain subframes so that macrocells can schedule theirhigh velocity UEs in these resources without co-channel inter-ference from picocells. However, the proposed approach onlyconsiders three broad classes of UE velocities: low, medium,and high. Moreover, no adaptation of the REB has been takeninto account. In [6], a handover-aware ICIC approach basedon reinforcement learning is proposed. Hereby, the authorsmodel the ICIC approach as a sub-band selection problem formobility robustness optimization in a small cell only network.In [7], the cell selection problem in HetNets is formulated asa network wide proportional fairness optimization problem byjointly considering the long-term channel condition and loadbalance in a HetNet. While the proposed method enhancesthe cell-edge UE performance, no results related to mobilityparameters are presented.To the best of our knowledge there is no previous workrelated to learning based mobility management in HetNetsby jointly considering load balancing and UE scheduling.In this paper, we propose a joint MM and context-awareUE scheduling approach by using tools from reinforcementlearning. Hereby, each base station (BS) individually optimizes

Classical MM:W/o picocell coverage optimizationmacro UE pico UEMacro/pico coordination: Average rate of UE is exchanged in case of handover Learning based MM:Macro-/picocell coverage optimizationmacro UE pico UEa) b) Fig. 1: a) Classical MM framework w/o picocell coverageoptimization, b) Proposed learning based MM framework con-sidering velocity and history (average rate) based scheduling.its own strategy (REB, UE scheduling) based on limitedcoordination among tiers. Both macro- and picocells learnhow to optimize their trafﬁc load in the long-term and theUE association process in the short-term by performing historyand velocity based scheduling. We propose multi armed bandit(MAB) and satisfaction based MM learning approaches aimingat improving the overall system performance and reducing theHOF and PP probabilities.To illustrate the differences between the classical MMand our proposed approach, we depict in Fig. 1 a) andFig. 1 b) the basic idea of the classical MM and proposedMM approaches, respectively. In the classical MM approach,there is no information exchange among tiers in case of UEhandover and trafﬁc ofﬂoading might be achieved by picocellrange expansion. In the proposed MM approaches, instead,each cell individually optimizes its own MM strategy basedon limited coordination among tiers. The major differencebetween MAB and satisfaction based learning is that MABaims at maximizing the overall capacity while satisfactionbased learning aims at satisfying the network in terms ofcapacity. In both cases, macro and pico BSs learn on thelong-term how to optimize their REB, which results in load-balancing. On the short-term, based on these optimized REBvalues, each cell carries out user scheduling by consideringeach UE’s velocity and average rate, through coordinated effortamong the tiers. Our contributions are as follows: • In the proposed MM approaches, we focus on bothshort-term and long-term solutions. In the long-term, atrafﬁc load balancing procedure in a HetNet scenariois proposed, while in the short-term the UE associationprocess is solved. • To implement the long-term load balancing method, wepropose two learning based MM approaches by usingreinforcement learning techniques: a MAB based and asatisfaction based MM approach. • The short-term UE association process is based on aproposed context-aware scheduler considering a UE’sthroughput history and velocity to enable fair schedulingand enhanced cell association.The rest of the paper is organized as follows. Section IIdescribes the system model, the problem formulation forMM, and the context-aware scheduler. In Section III, weintroduce the learning based MM approaches. Section IVpresents system level simulation results, and ﬁnally, Section Vconcludes the paper.II. S

YSTEM M ODEL

We focus on the downlink transmission of a 2-layer HetNet,where layer 1 is modeled as macrocells and layer 2 aspicocells. The HetNet consists of a set of BSs K = { , . . . , K } with a set M = { , . . . , M } of macrocells underlaid by aset P = { , ..., P } of picocells, where K = M ∪ P . MacroBSs are dropped following a hexagonal layout including threesectors. Within each macro sector m , p ∈ P picocells arerandomly positioned, and a set U = { , ..., U } of UEswhich are randomly dropped within a circle around eachpicocell p (hotspot). The UEs associated to macrocells arereferred as macro UEs U ( m ) = { m ) , . . . , U ( m ) } ∈ U and the UEs served by picocells are referred as pico UEs U ( p ) = { p ) , . . . , U ( p ) } ∈ U , where U ( p ) = U ( m ) . EachUE i ( k ) with k ∈ { m, p } has a randomly selected velocity v i ( k ) ∈ V km/h and a random direction of movement withinan angle of [0; 2 π ] . A co-channel deployment is considered,in which picocells and macrocells operate in a system witha bandwidth B consisting of r = { , . . . , R } resource blocks(RBs). At every time instant t n = nT s with n = [1 , . . . , N ] and T s = 1 ms, each BS k decides how to expand itscoverage area by learning its REB β k = { β m , β p } with β m = {

0; 3; 6 } dB and β p = {

0; 3; 6; 9; 12; 15; 18 } dB . Bothmacro and pico BSs select their REB to decide which UE i ( k ) to schedule on which RB based on the UE’s contextparameters. These context parameters are deﬁned as the UE’svelocity v i ( k ) , its instantaneous rate φ i ( k ) ( t n ) when associatedto BS k and its average rate φ i ( k ) ( t n ) deﬁned as φ i ( k ) ( t n ) = T P Nn =1 φ i ( k ) ( t n ) , whereby T = N T s is a time window. Theinstantaneous rate φ i ( k ) ( t n ) is given by: φ i ( k ) ( t n ) = B i ( k ) · log (cid:0) γ i ( k ) ( t n ) (cid:1) , (1)with γ i ( k ) ( t n ) being the SINR of UE i ( k ) at time t n , whichis deﬁned as: γ i ( k ) ( t n ) = p k · g i ( k ) ,k ( t n ) P j ∈K j = k p j · g i ( k ) ,j ( t n ) + σ , (2)with p k being the transmit power of BS k , and g i ( k ) ,k ( t n ) being the channel gain from cell k to UE i ( k ) associated toBS k . The bandwidth B i ( k ) in equation (1) is the bandwidthwhich is allocated to UE i ( k ) by BS k at time t n . We consider lower REB values for macro BSs to avoid overloadedmacrocells due to their large transmission power.

A. Handover Procedure

According to the 3GPP standard, the handover mechanismis based on RSRP measurements, the ﬁltering of measuredRSRP samples, Handover Hysteresis Margin, and TTT mech-anisms [3]. A handover is executed if the target cell’s (biased)RSRP (plus hysteresis margin) is larger than the source cell’s(biased) RSRP. In summary, the handover condition for a UE i ( k ) to BS k is deﬁned as: P l ( i ( l )) + β l < P k ( i ( k )) + β k + m hist , (3)with { l, k } ∈ K , m hist is the UE- or cell-speciﬁchysteresis margin, β k ( β l ) is the REB of BS k ( l ) , and P k ( i ( k )) ( or P l ( i ( l ))) [dBm] is the i ( k ) -th ( or i ( l ) -th) UE’sRSRP from BS k ( l ) after TTT. B. Problem Formulation

Our optimization approach aims at maximizing the totalrate of the network. Hereby, we consider long-term and short-term processes. The long-term load balancing optimizationapproach is solved by the proposed learning based MMapproaches presented in Section III-A and Section III-B,which result in REB β k value optimization and in loadbalancing φ k, tot ( t n ) . Based on the estimated instantaneousload, the context-aware scheduler selects, in the short-term,for each RB a UE by considering its history and velocityas described in Section II-C. This results in each UE’sinstantaneous rate φ i ( k ) ( t n ) and the RB allocation vector α i ( k ) ( t n ) = (cid:2) α i ( k ) , , ..., α i ( k ) ,R (cid:3) containing binary variables α i ( k ) ,r , and indicating whether UE i ( k ) of BS k is allocatedat RB r or not. At each time instant t n , each BS k performsthe following optimization: max α i ( k ) ( t n ) β k N X n =1 X i ( k ) ∈U k R X r =1 α i ( k ) ,r ( t n ) · φ i ( k ) ,r ( t n ) (4)subject to: α i ( k ) ,r ( t n ) ∈ { , } (5) X i ( k ) ∈U k α i ( k ) ,r = 1 ∀ r, ∀ k, (6) p k ≤ p max k (7) φ i ( k ) ( t n ) ≥ φ k, min , (8)where φ i ( k ) ,r ( t n ) is the instantaneous rate of UE i ( k ) at RB r . The condition in (7) implies that the total transmitted powerover all RBs does not exceed the maximum transmissionpower p max k of BS k . C. Context-Aware Scheduler

The proposed MM approach does not only optimize theload according to Section II-B, but considers also context-aware and fairness based UE scheduling. At each RB r , a UE i ( k ) is selected to be served by BS k on RB r according tothe following scheduling criterion: i ( k ) r ∗ = sort min ( v i ( k ) ) (cid:18) arg max i ( k ) ∈U k φ i ( k ) ,r ( t n ) φ i ( t n ) (cid:19) , (9) where sort min( v i ( k ) ) sorts the candidate UEs according to theirvelocity starting with the slowest UE, i.e. if more than one UEcan be selected for RB r , the UE with minimum velocity isselected. The rationale behind introducing the sorting/rankingfunction for candidate UEs according to their velocity is thathigh-velocity UEs will not be favored over slow moving UEs.A scheduler according to (9) will allocate many (or even all)resources to a newly handed over UE since its average rate φ i ( t n ) in the target cell is zero, i.e. in the classical ProportionalFair scheduler, φ i ( t n ) = φ i ( k ) ( t n ) = 0 when a UE is handedover to cell k , whereas we redeﬁne it according to (10). Toavoid this and enable a fair resource allocation among all UEsin a cell, we propose a history based scheduling approach. Wedeﬁne the average rate φ i ( t n ) according to (10) incorporatingthe following idea: Via the X2-interface macro- and picocellscoordinate, so that once a macro UE i ( m ) is handed over topicocell p its rate history at time instant t n is provided topicocell p in terms of average rate φ i ( m ) ( t n ) , such that theUE’s (which is named as i ( p ) after the handover) average rateat picocell p becomes: φ i ( p ) ( t n + T s ) = T · φ i ( m ) ( t n ) + φ i ( p ) ( t n + T s ) T + T s . (10)In (10), a moving average rate is considered from macrocellto picocell, whereas in the classical MM approaches a UE’shistory is not considered and is equal to zero. In other words,the proposed MM approach considers the historical rate whenUE i ( m ) was associated to the macrocell m in the past.III. L EARNING B ASED M OBILITY M ANAGEMENT A LGORITHM

To solve the optimization approach deﬁned in SectionII-B, we rely on the self organizing capabilities of HetNetsand propose an autonomous solution for load balancing byusing tools from reinforcement learning [8]. Hereby, eachcell develops its own MM strategy to perform optimal loadbalancing based on the proposed learning based approachespresented in Section III-A and Section III-B. To realize this,we consider the game G = {K , {A k } k ∈K , { u k } k ∈K } . Hereby,the set K = {M ∪ P} represents the set of players (i.e., BSs),and for all k ∈ K , the set A k = { β k } represents the setof actions player k can adopt. For all k ∈ K , the function u k ( t n ) is the utility function of player k . The players learnat each time instant t n to optimize the load in long-term andto perform context aware scheduling in short-term based onthe algorithms presented in Section III-A and III-B by thefollowing steps:1) Action a k ∈ A k is selected based on the obtained utility u k ( t n ) = φ k, tot ( t n ) with φ k, tot ( t n ) being the total rateof player k at time t n as deﬁned in equation (11).2) The action selection strategy is updated based on theselected learning algorithm presented in Section III-Aand Section III-B.3) UE of BS k is allocated at RB r based on its velocity,its instantaneous rate, and its average rate according to(9). A. Multi-Armed Bandit Based Learning Approach

The objective of the MAB approach is to maximize theoverall system performance. MAB is a machine learningtechnique based on an analogy with the traditional slotmachine (one armed bandit) [9]. When pulled at time t n ,each machine/player provides a reward. The objective is tomaximize the collected reward through iterative pulls, i.e.learning iterations. The player selects its actions based ona decision function reﬂecting the well-known exploration-exploitation trade-off in learning algorithms.The set of players, actions and the utility function for ourMAB based MM approach is deﬁned as follows: • Players:

Macro BSs M = { , . . . , M } and pico BSs P = { , . . . , P } . • Actions: A k = { β k } with β m = [0 , , dB and β p = [0 , , , , , , dB being the CRE bias. Weconsider higher bias values for picocells due to their lowtransmit power. The considered bias values rely partiallyon the assumptions in [10] and at the same time extensivesimulation results. • Strategy:

1) Every BS learns its optimum CRE bias value on along-term basis considering its load: φ k, tot ( t n ) = X i ( k ) ∈U k R X r =1 α i ( k ) ,r ( t n ) · φ i ( k ) ,r ( t n ) . (11)This is inter-related with the handover triggering bydeﬁning the cell border of each cell,2) A UE is handed over to BS k if it fulﬁlls thecondition (3).3) RB based scheduling is performed based on equa-tion (9). • Utility Function:

The utility function in MAB learning isa decision function composed by an exploitation term rep-resented by player k ’s total rate and exploration part con-sidering the number of times an action has been selectedso far. Player k selects its action a j ( k ) ( t n ) ∈ A k at time t n through maximizing a decision function d k,a j ( k ) ( t n ) ,which is deﬁned as: d k,a j ( k ) ( t n ) = u k,a j ( k ) ( t n )+ vuut (cid:16)P |A k | i =1 n k,a i ( k ) ( t n ) (cid:17) n k,a j ( k ) ( t n ) , (12)whereby u k,a j ( k ) ( t n ) is the mean reward of player k attime t n for action a j ( k ) , n k,a j ( k ) ( t n ) is the number oftimes action a j ( k ) has been selected by player k untiltime t n , and | · | represents the cardinality.During the ﬁrst t n = |A k | · T s player k selects each actiononce in a random order to initialize the learning process byreceiving a reward for each action. For the following itera-tions t n > |A k | · T s action selection is performed accordingto Algorithm 1. In each learning iteration the action a ∗ j ( k ) that maximizes the decision function in (12) is selected. Thenthe parameters are updated, whereby the following notation isused: s k,a j ( k ) ( t n ) is the cumulated reward of player k after Algorithm 1

MAB based mobility management algorithm. for t n do for i = 1 : |A k | do Select action a ∗ j ( k ) according: a ∗ j ( k ) = arg max a j ( k ) ∈|A k | (cid:0) d k,a j ( k ) ( t n ) (cid:1) Update parameters according to: Update the cumulated reward when player k selectsaction a j ( k ) s k,a j ( k ) ( t n + T s ) = s k,a j ( k ) ( t n ) + i = j · φ k, tot ( t n ) n k,a j ( k ) ( t n + T s ) = n k,a j ( k ) ( t n ) + i = j u k,a j ( k ) ( t n + T s ) = s k,aj ( k ) ( t n + T s ) n k,aj ( k ) ( t n + T s ) end for t n = t n + T s end for playing action a j ( k ) and i = j is equal to 1 if i = j and zerootherwise. B. Satisfaction Based Learning Approach

Satisfaction based learning approaches guarantee to satisfythe players in a system [11]. Here, we consider the playerto be satisﬁed if its cell reaches a certain minimum levelof total rate and if at least 90% of the UEs in the cellobtain a certain average rate. The rationale behind consideringthese satisfaction conditions is to guarantee each single UE’sminimum rate while at the same time improving the total rateof the cell.To enable a fair comparison, the set of players and thecorresponding set of actions in the proposed satisfaction basedMM approach are the same as in the MAB based MMapproach. The utility function of player k at time t n is deﬁnedas the load according to equation (11). In the satisfaction basedlearning approach, the actions are selected according to aprobability distribution π k ( t n ) = [ π k, ( t n ) , . . . , π k, |A k | ( t n )] .Hereby, π k,j ( t n ) is the probability with which BS k choosesits action a j ( k ) ( t n ) at time t n . The following learning stepsare performed in each learning iteration:1) In the ﬁrst learning iteration t n = 1 the probability ofeach action is equal and an action is selected randomly.2) In the following learning iterations t n > , the playerchanges its action selection strategy only if the receivedutility does not satisfy the cell, i.e. if the satisfactioncondition is not fulﬁlled.3) If the satisfaction condition is not fulﬁlled, the player k selects its action a j ( k ) ( t n ) according to the probabilitydistribution π k ( t n ) .4) Each player k receives a reward φ k, tot ( t n ) based on theselected actions.5) The probability π k,j ( t n ) of action a j ( k ) ( t n ) is updatedaccording to the linear reward-inaction scheme: π k,j ( t n ) = π k,j ( t n − T s ) + λ · b k ( t n ) · (cid:18) a j ( k ) ( t n )= a i ( k ) ( t n ) − π k,j ( t n − T s ) (cid:19) , (13)whereby a j ( k ) ( t n )= a i ( k ) ( t n ) = 1 for the selected actionand zero for the non-selected actions and b k ( t n ) isdeﬁned as follows: C D F ClassicalMAB Satisfaction

Fig. 2: CDF of the UE throughput for 30 UEs and 1 pico BSper macrocell and TTT = 480 ms. b k ( t n ) = u k, max + φ k, tot ( t n ) − u k, min · u k, max , (14)with u k, max being the maximum rate in case of single-UE and u k, min = · u k, max . Hereby, λ = · t n + T s isthe learning rate.IV. S IMULATION R ESULTS

The scenario used in the system-level simulations is basedon conﬁguration P = { , , } pico BSs per macrosector, randomly distributed within the macrocellular environ-ment. In each macro sector, U = 30 mobile UEs are randomlydropped within a 60 m radius of each pico BS. The rationalebehind dropping all UEs around pico BSs is to obtain a largenumber of handover within a short time in order to avoid largecomputation times due to the complexity of our system levelsimulations. Each UE i ( k ) has a randomly selected velocity v i ( k ) of V = {

3; 30; 60; 120 } km/h and a random directionof movement within [0; 2 π ] , so that both macro-to-pico andpico-to-macro handover may occur. We consider fast-fadingand shadowing effects in our simulations that are based on3GPP assumptions [12]. To compare our results with otherapproaches we consider a baseline MM approach as deﬁned in[3]. The UE performs RSRP measurements over one subframeevery 40 ms and reports this value. The Layer 1 ﬁlteringaverages the reported RSRP values every 200 ms to ﬁlterout fast fading effects. This value is further averaged throughaﬁrst-order iﬁnite impulse response (IIR-)ﬁlter which is knownas a Layer 3 ﬁlter. A handover is then triggered if the Layer 3ﬁltered handover measurement meets the handover event entrycondition in (3). A UE is handed over to its target cell afterTTT. For the baseline MM approach, we consider proportionalfair based scheduling, with no information exchange betweenmacro and pico BSs. This baseline approach is referred to as classical HO approach. S u m − r a t e [ M bp s ] ClassicalMABSatisfaction

Fig. 3: Sum-rate vs. number of pico BSs per macrocell andTTT = 480 ms.Fig. 2 depicts the cumulative distribution function (CDF)of the UE throughput for the classical , MAB and satisfaction based MM approaches. Compared to the classical approach,MAB and satisfaction based approaches lead to an improve-ment of 43% and 75% in average (50-th %), respectively.Hence, the satisfaction based approach outperforms the otherMM approaches in terms of average UE throughput. In caseof the cell-center UE throughput, which is deﬁned as the95-th % throughput, the opposite behavior is obtained. Inthis case an improvement of 124% and 80% is achievedfor the MAB and satisfaction based approaches, respectively.The reason is that the satisfaction based MM approach onlyaims at satisfying the network in terms of rate and does notupdate its learning strategy once satisfaction is achieved. TheMAB based approach on the other hand aims at maximizingthe network performance, which is reﬂected in the improvedcell-center UE throughput. The gains of the proposed MMapproaches are also reﬂected in the cell-edge UE throughput,which is zoomed in Fig. 2. Here, the MAB and satisfactionbased approaches yield 39% and 80% improvement comparedto the classical approach.To compare the performance of the proposed approachesfor different number of picocells per macrocell, Fig. 3 plotsthe sum-rate versus number of pico BSs per macrocell. Fordifferent number of pico BSs the proposed MM approachesyield gains of around 70%-80 % for TTT = 480 ms. InFig. 4, the performance of the sum-rate versus UE density permacrocell is depicted for TTT = 40 ms and TTT = 480 ms. Inboth cases, the classical approach yields very low rates, whilethe proposed approaches lead to signiﬁcant improvement of upto 81 % for TTT = 40 ms and 85 % for TTT = 480 ms andconverge to a signiﬁcantly larger sum-rate than the classicalapproach.Besides the gains in terms of rate, our proposed learningbased approaches yield also improvements in terms of HOFprobability as depicted in Fig. 5. For the HOF performanceevaluation, we modify our simulation settings by settingthe same velocity for each UE. Compared to the classicalMM approach, the proposed methods yield the same HOF

10 15 20 25 30 35 40 45 505075100125150 Number of UEs per macrocell S u m − r a t e [ M pb s ] Classical (TTT = 480 ms)MAB (TTT = 480 ms)Satisfaction (TTT = 480 ms)Classical (TTT = 40 ms)MAB (TTT = 40 ms)Satisfaction (TTT = 40 ms)

Fig. 4: Sum-rate vs. number of UEs per macrocell with 1 picoBS and TTT = 40 ms and TTT = 480 ms. HO F p r ob a b ilit y ClassicalMABSatisfaction

TTT = 480 msTTT = 40 ms

Fig. 5: HOF and ping pong probability for 30 UEs and 1 picoBS per macrocell and TTT = 40 ms and TTT = 480 ms.probability for UEs at 3 km/h speed. For higher velocities inwhich more HOF is expected, the HOF probability obtainedby the proposed approaches is signiﬁcantly lower than in caseof classical MM.The PP probability is depicted in Fig. 6. For TTT = ms,all MM methods yield very similar PP probabilities forlower velocities while this probability is decreased for highervelocities. This slope is aligned with the results presentedin [3]. However, for high velocity UEs, the PP probabilityof the proposed MM approaches is half of the PP proba-bility obtained for the classical MM approach which showsa signiﬁcant improvement. The rationale behind this is thatboth tiers perform CRE for load balancing, i.e. if one celltries to extend its coverage/handover a UE the other cell mayprevent this handover by extending its coverage, too. In caseof TTT = ms almost no PPs are observed.V. C ONCLUSION

We propose two learning based MM approaches and a his-tory based context-aware scheduling method for HetNets. Theﬁrst learning approach is based on MAB methods and aims PP p r ob a b ilit y ClassicalMABSatisfaction

TTT = 480 ms TTT = 40 ms

Fig. 6: PP probability for 30 UEs and 1 pico BS per macrocelland TTT = 40 ms and TTT = 480 ms.at system performance maximization. The second learningmethod aims at satisfying each cell and each UE of a cellbased on satisfaction based learning. System level simulationsdemonstrate the performance enhancement of the proposedapproaches compared to the classical MM method. While upto 80% gains are achieved in average for UE throughput, theHOF probability is reduced up to a factor of three by theproposed learning based MM approaches.R

EFERENCES[1] “Cisco Visual Networking Index: Global Mobile Data Trafﬁc ForecastUpdate, 2013-2018,”

Cisco Public Information , February, 2014.[2] A. Damnjanovic, et al., “A Survey on 3GPP Heterogeneous Networks”,

IEEE Wirel. Comm. , vol.18, no. 3, June 2011.[3] 3GPP TR 36.839, “Evolved Universal Terrestrial Radio Access (E-UTRA); Mobility Enhancements in Heterogeneous Networks,” V11.1.0,2012.[4] S. Barbera, P. H. Michaelsen, M. SÃd’ily, and K. Pedersen, “MobilityPerformance of LTE Co-Channel Deployment of Macro and Pico Cells,”in Proc.

IEEE Wireless Comm. and Networking Conf. (WCNC) , France,Apr. 2012.[5] D. Lopez-Perez, I. Guvenc, and X. Chu, “Mobility Management Chal-lenges in 3GPP Heterogeneous Networks,”

IEEE Comm. Mag. , vol. 50,no. 12, Dec. 2012.[6] A. Feki, V. Capdevielle, L. Roullet, and A.G.Sanches, “Handover awareinterference management in LTE small cells networks,” in Proc.

IEEE11th International Symposium on Modeling & Optimization in Mobile,Ad Hoc & Wireless Networks (WiOpt) , May 2013.[7] J. Wang, J. Liu, D. Wang, J. Pang, and G. Shen, “Optimized FairnessCell Selection for 3GPP LTE-A Macro-Pico HetNets, ” in Proc.

IEEEVehic. Technol. Conf. (VTC) , CA, Sept. 2011.[8] M. E. Harmon and S. S. Harmon, “Reinforce-ment Learning: A Tutorial,” 2000. Available:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.2480&rep=rep1&type=pdf.[9] P. Auer, N. Cesa-Bianchi and P. Fischer, “Finite time Analysis for theMultiarmed Bandit Problem”,

Machine Learning , vol. 47, pp. 235-256,2002.[10] 3GPP R1-113806, “Performance Study on ABS with Reduced MacroPower,” 3GPP TSG-RAN Meeting , 2006.[12] 3GPP TR 36.814, “Evolved Universal Terrestrial Radio Access (EU-TRA); Further advancements for E-UTRA Physical Layer Aspects,”V9.0.0, 2010.[13] M. Simsek, T. Akbudak, B. Zhao, and A. Czylwik, “An LTE-FemtocellDynamic System Level Simulator,”