Joint Channel Probing and Proportional Fair Scheduling in Wireless Networks
aa r X i v : . [ c s . I T ] S e p Joint Channel Probing and Proportional FairScheduling in Wireless Networks
Hui Zhou, Pingyi Fan, Dongning Guo
Abstract
The design of a scheduling scheme is crucial for the efficiency and user-fairness of wirelessnetworks. Assuming that the quality of all user channels is available to a central controller, a simplescheme which maximizes the utility function defined as the sum logarithm throughput of all users hasbeen shown to guarantee proportional fairness. However, to acquire the channel quality information mayconsume substantial amount of resources. In this work, it is assumed that probing the quality of eachuser’s channel takes a fraction of the coherence time, so that the amount of time for data transmissionis reduced. The multiuser diversity gain does not always increase as the number of users increases. Incase the statistics of the channel quality is available to the controller, the problem of sequential channelprobing for user scheduling is formulated as an optimal stopping time problem. A joint channel probingand proportional fair scheduling scheme is developed. This scheme is extended to the case where thechannel statistics are not available to the controller, in which case a joint learning, probing and schedulingscheme is designed by studying a generalized bandit problem. Numerical results demonstrate that theproposed scheduling schemes can provide significant gain over existing schemes.
I. I
NTRODUCTION
Efficient and fair scheduling is important for wireless systems with limited resources andheterogeneous user conditions. A large class of resource allocation schemes with fairness consid-erations are obtained by maximizing some utility functions of the throughput [1]. In particular,proportional fairness is achieved when the utility is the sum of the logarithm of the users’
H. Zhou and P. Fan are with the Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China (e-mail:[email protected]; [email protected])D. Guo is with the Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208,U.S.A. (e-mail: [email protected])
September 24, 2018 DRAFT throughput. In existing third generation wireless systems, like EV-DO and HSDPA, proportionalfair (PF) scheduling scheme is employed at the base station to schedule downlink traffic tomobile users. The PF scheme strikes a good balance between throughput efficiency and fairnessby exploiting the multiuser diversity [2] and the game-theoretic equilibrium [3]. Analysis andapplications on PF scheduling have been extensively explored from various aspects due to itsfavorable performance and low implementation complexity. For example, there have been studiesof the convergence and optimality [4], stability [5], throughput [6] and capacity region [7] ofPF scheduling.Most previous work on PF scheduling assume that the instantaneous channel quality informa-tion (CQI) of all users is known to the scheduler at no cost. In practice, however, acquiring theCQI often consumes a significant amount of resources in terms of time, bandwidth and power. Itis important to understand the impact of the cost when the number of users is large, because thecost may scale linearly with the user population. The goal of this work is to answer the followingtwo questions: 1) to what extent will the CQI acquisition affect the scheduling? and 2) how toprobe and schedule the users to achieve the best performance with proportional fairness?There have been related works on the impact of the channel uncertainty on the communicationsystems. The loss of throughput caused by poor estimates of channel quality is quantified in [8].Joint channel probing and user scheduling has also been addressed recently. Several schemeswith the objective of maximizing the system throughput have been designed in [9]–[12]. And theauthors of [13]–[15] propose schemes for stabilizing the queues and characterize the networkthroughput region. In contrast to the preceding works, the goal of this paper is to design aproportional fair scheduling scheme which takes into account the cost of channel probing. Ourprevious work [16] has shown the scheme and its performance roughly. In this paper, we notonly present the derivation of the scheme with rigorous arguments, but also show its asymptoticbehavior and the optimality with theoretical rigor. In addition, the scheme is extended to a moregeneralized scenario. The organization and main contributions of this work are as follows: • Section II describes the network model.
September 24, 2018 DRAFT • In Section III, we assume the prior distribution of CQI is known to the scheduler, andformulate the problem of sequentially probing user channels to make scheduling decisionas a stopping time problem. A simple scheme based on maximizing the sum logarithmthroughput of all users is shown to guarantee proportional fairness and convergence. Thescheduling gain of the scheme is determined analytically. Further reduction of computationalcomplexity is also discussed. • In Section IV, the statistics of the CQI is assumed not to be available to the scheduler. Theproblem is formulated as a generalized bandit problem, and a joint learning, probing andscheduling scheme is proposed. • In Section V, significant advantages of the proposed schemes are demonstrated using nu-merical experiments. In typical scenarios where the statistics of the CQI are not available,the joint learning, probing and scheduling scheme achieves almost the same performanceas that in the case where the statistics are known.II. T HE N ETWORK M ODEL
Consider a wireless system with one controller and K users with time-varying channel quality,such as in the downlink of a cellular system. Let time be divided into unit-length slots and onlyone user can be served in each slot. As in most related work (e.g., [4] and [6]), the transmitpower is assumed to be fixed so that dynamic power allocation is not considered. Thus theachievable rate is only determined by the instantaneous channel quality. Moreover, we assumesaturated traffic for all users.Assume slow fading, where the duration of a slot is much shorter than the channel coherencetime, so that the channel quality remains constant during each slot. We make the following homogeneous rate assumption that the rate of each user normalized by its mean value followsthe same distribution: (A1) Let X , . . . , X K be independent identically distributed (i.i.d.) non-negative random vari-ables with unit mean value. Let r , . . . , r K ≥ be constants. Let R k = r k X k for k = 1 , . . . , K . September 24, 2018 DRAFT
The achievable rates { R k ( n ) | k = 1 , . . . , K ; n = 1 , , . . . } are independent. For every user k , therates over the time slots, R k (1) , R k (2) , . . . , are i.i.d. following the same distribution as that of R k . Clearly, E R k ( n ) = r k .The instantaneous achievable rates of all users are not known a priori . During each slot n ,to obtain the achievable rate R k ( n ) requires the scheduler to probe the channel of user k usinga fraction β of the slot. Let I k ( n ) be an indicator of the event that user k is scheduled fortransmission in slot n . Let J ( n ) denote the number of probed users in slot n . The amount ofdata transmitted to or by user k during slot n is B k ( n ) = (1 − J ( n ) β ) R k ( n ) I k ( n ) , which isnonzero for only one user during each slot. The throughput of user k averaged over n slots isthus T k ( n ) = 1 n n X j =1 B k ( j ) . (1)III. J OINT P ROBING AND S CHEDULING WITH K NOWN C HANNEL S TATISTICS
In this section, we consider the case where the statistics of R = [ R , . . . , R K ] is known tothe scheduler and design a proportional fair scheme. A. The Algorithm
Consider first a scheme which maximizes the utility defined as the sum logarithm throughput: u ( T ( n )) = K X k =1 ln T k ( n ) . (2)Note that by (1), T k ( n ) = n − n T k ( n −
1) + 1 n B k ( n ) . (3) September 24, 2018 DRAFT
So that the increase of the utility function after the n -th slot is u ( T ( n )) − u ( T ( n − K X k =1 (ln T k ( n ) − ln T k ( n − K X k =1 ln (cid:18) n − n + 1 n B k ( n ) T k ( n − (cid:19) = K X k =1 ln (cid:18) n − n + 1 − βJ ( n ) n s k ( n ) I k ( n ) (cid:19) , (4)where the throughput-normalized rate is s k ( n ) = R k ( n ) T k ( n − . (5)Since the indicator I k ( n ) is zero for all but one user k in each slot, one can see that to greedilymaximize the utility increment at time slot n , we should schedule the user with the maximum s k ( n ) , which is the classical PF scheduling algorithm.However, due to the assumption that the instantaneous rates R k ( n ) are unknown a priori,we can only probe the users rates and obtain s k ( n ) one by one in each slot. We formulate thefollowing optimal stopping time problem [18]. Note that the scheduling decision made in oneslot has no impact on future realization of the rates, it suffices to consider one arbitrary slotand omit the time index n . For the scheduler, the joint probing and scheduling problem at thebeginning of the time slot is defined by two objects:(i) The independent throughput-normalized rates s , . . . , s K .(ii) A sequence of positive-valued reward functions y , . . . , y K , where if j channels havebeen probed to reveal their throughput-normalized instantaneous rates t , . . . , t j , the reward ofterminating the probing phase and schedule the best user found so far is y j ( t , . . . , t j ) = (1 − jβ ) max( t , . . . , t j ) . (6)The theory of optimal stopping is concerned with determining the stopping time J to max-imize the expected reward E [ y J ] . The maximum number of probings in every slot is J max = September 24, 2018 DRAFT min( K, ⌊ /β ⌋ ) . Compared with the classical optimal stopping problem, the formulation aboveis more general in the sense that the probing order of s k is not deterministic. Hence the jointprobing and scheduling scheme basically includes two tasks in each slot: to determine the orderin which users are probed, and to select one user as the destination at a proper (stopping)time. Recalling the objective of maximizing the expected y j , the user with the largest E [ s k ( n )] should be probed first, and then the second largest and so on. From Assumption (A1), we know ¯ s k ( n ) , E [ s k ( n )] = r k /T k ( n − . Hence the probing order is π ( n ) = ( k , · · · , k K ) such that ¯ s k ( n ) ≥ · · · ≥ ¯ s k K ( n ) . Now that the probing order has been determined, the decision on whento stop can be addressed by investigating the structural property of the problem. Theorem 1:
Under the homogeneous rate assumption (A1), the joint probing and schedulingproblem is a monotone stopping problem [18, Chapter 5], which means that, if E j denotes theevent (cid:8) y j ( s k , · · · , s k j ) ≥ E [ y j +1 ( s k , · · · , s k j +1 ) | s k , · · · , s k j ] (cid:9) , (7)then E j ⊆ E j +1 for ≤ j ≤ J max − . Proof:
See appendix A.Now the problem has been proved to be monotone, then from the [18, Theorem 1, Chapter 5],the one-state look-ahead rule is optimal. The one-stage look-ahead rule is the one that stops ifthe reward for stopping at current stage is at least as large as the expected reward of continuingone stage and then stop. Mathematically, the rule is described by the stopping time. Let w j denote the largest value of the observed throughput-normalized rate after probing j users and a ∨ b , max( a, b ) , the optimal stopping time is J ∗ = min (cid:26) j ≥ − jβ ) w j ≥ (1 − ( j + 1) β ) E (cid:20) w j ∨ R k j +1 T k j +1 ( n − (cid:12)(cid:12)(cid:12)(cid:12) w j (cid:21)(cid:27) , (8)which solves the stopping problem almost surely in each slot. Precisely, the optimal PF jointprobing and scheduling (JPS-PF) scheme is described as Algorithm 1. B. On the Optimality of Algorithm 1
To present the optimality of Algorithm 1, we need to show the convergence property.
September 24, 2018 DRAFT
Algorithm 1:
JPS-PF Initialization: T k (0) ← for k = 1 , · · · , K ; for n = 1 , , · · · do ¯ s k ( n ) ← r k /T k ( n − . Sort the throughput-normalized mean rate ¯ s k ( n )( k = 1 , · · · , K ) in the descending order: ¯ s k ( n ) ≥ · · · ≥ ¯ s k K ( n ) ; j ← , w ← ; do j ← j + 1 ; Probe user k j and get the rate R k j ( n ) ; w ← w ∨ R k j ( n ) /T k j ( n − ; while (1 − jβ ) w < (1 − ( j + 1) β ) E h w ∨ R kj +1 T kj +1 ( n − i ; Transmit to user k j . Update T ( n ) ; end Theorem 2:
Assume (A1). Then for any initial condition, the throughput sequence T ( n ) generated under Algorithm 1 converges almost surely to the limit point T ∗ of the ordinarydifferential equation ˙ T ( t ) = h ( T ( t )) , where h ( T ) = − T + E [ B ( n ) | T ( n −
1) = T ] . Moreover,all users’ steady-state throughput are proportional to their mean rate with an identical ratio κ , T ∗ r = T ∗ r = · · · = T ∗ K r K = κ. (9) Proof:
Let M ( n ) = B ( n ) − E [ B ( n ) | T ( n − . By (3), the update of users’ throughputcan be organized in the form of stochastic approximation iteration [19, Eqn. 2.1.1]: T ( n ) = T ( n −
1) + a ( n )[ h ( T ( n − M ( n )] , where a ( n ) = 1 /n . The equation above is a standard stochastic approximation expression. Itis easy to verify that h ( · ) is Lipshitz, the stepsize satisfies P n a ( n ) = ∞ , P n a ( n ) < ∞ and T ( n ) is bounded. Furthermore, it is easy to verify that E [ M ( n ) | M (1) , · · · , M ( n − ,so M ( n ) is a martingale difference sequence. Now the throughput update under the proposed September 24, 2018 DRAFT scheme satisfies the assumptions (A1)-(A4) in [19, Section 2.1], then applying Theorem 2 in[19, Section 2.1] directly, the convergence conclusion holds.Now the convergence of the throughput sequence has been obtained. The remainder of theproof is by contradiction. Suppose (9) does not hold at steady state and that T ∗ /r < T ∗ /r without loss of generality. Consider the throughput path starting at slot n which is at steadystate. At this time, ¯ s l = r l /T ∗ l ( l = 1 , and ¯ s > ¯ s . Thus user is probed first in each slot.From assumption (A1) we know that s and s are of the same type of distribution, but s has alarger mean value. Thus user is selected for transmission more often than user , which wouldfurther imply T ( n + n ) /r > T ( n + n ) /r after a sufficiently large number ( n ) of slots,which contradicts the steady state assumption with T ∗ /r < T ∗ /r .Note that the constant proportionality factor κ is a bridge connecting the steady-state through-put and the mean-rate. After obtaining κ , it is straightforward to evaluate the throughput andutility. On the other hand, due to the fact that κ is a constant, we have the following corollaryfrom the proof of Theorem 2. Corollary 1:
Under Algorithm 1, the probability that each user is selected as the destinationis identical as /K .Algorithm 1 is asymptotically optimal in the following sense: Theorem 3:
Assume (A1). Then T ∗ maximizes the PF utility u ( · ) over the rate region gen-erated by all joint probing and scheduling schemes. Proof:
Let S denote the set composed of all the feasible schemes Γ under the assumptionthat only one user can be selected in one slot. The developed scheme in this paper is denoted as Γ ∗ . We have shown in the derivation of Algorithm 1 that Γ ∗ is optimal for solving the monotonestopping problem in each slot, that is, it maximizes B k ( n ) /T k ( n − in slot n almost surely. Dueto the constraint that only one user can be scheduled in one slot, we can see that the developedscheme Γ ∗ satisfies Γ ∗ ∈ arg max Γ ∈S K X k =1 B (Γ) k ( n ) T k ( n − , (10) September 24, 2018 DRAFT where B (Γ) k ( n ) is the number of bits transmitted to user k in slot n under the scheme Γ . Recallingthe definition of the utility function in (2), it can be found that K X k =1 B (Γ) k ( n ) T k ( n −
1) = ∇ u ( T ( n − · B (Γ) ( n ) , (11)which means that the scheme chooses a decision maximizing the scalar product of B (Γ) ( n ) andthe gradient ∇ u ( T ( n − .The gradient scheduling algorithm developed by Stolyar [17] is that, at time n the controllerchooses a decision Γ( n ) ∈ arg max Γ ∇ u ( T ( n − · B (Γ) ( n ) . Let ˜ T denote the solution to theproblem max u ( T ) s.t. T ∈ V , where V is the system rate region, i.e., the set of all feasible long-term service rate vectors. Thenthe [17, Theorem 2] shows that the expected average service rates under the gradient schedulingalgorithm converges in probability to ˜ T .By (10) and (11), one can see that the joint probing and scheduling algorithm in this paperbelongs to the gradient scheduling algorithm. From the convergence of Algorithm 1, we know T ∗ = ˜ T . Then the achieved throughput T ∗ maximizes the PF utility function asymptotically. C. A Static Threshold Criteria
Note that in Algorithm 1, after each probe, the scheduler needs to evaluate the expectation in(8) which depends on the channel realizations. Further reduction in the computational complexityis possible by simply comparing the highest normalized rate against a sequence of deterministicthresholds, in lieu of computing (8). Consider the steady-state case where users’ throughput isexactly T ∗ . Note that by Theorem 2, R k j +1 T k j +1 ( n −
1) = R k j +1 T ∗ k j +1 , September 24, 2018 DRAFT0 which is identically distributed as X /κ . For ≤ j ≤ J max − , the inequality of w j in (8)reduces to (1 − jβ ) w j ≥ (1 − ( j + 1) β ) E [max( w j , κ − X ) | w j ] . (12)It turns out that (12) can be reduced to comparing κw j with a static threshold v j , which canbe determined as follows. Let F X ( · ) denote the cumulative distribution function (CDF) of X k .Then E (cid:20) max (cid:18) w j , X κ (cid:19) (cid:12)(cid:12)(cid:12)(cid:12) w j (cid:21) = w j + Z ∞ κw j (cid:16) xκ − w j (cid:17) dF X ( x ) . (13)So that (12) can be rewritten as (1 − jβ ) w j ≥ (1 − ( j + 1) β ) " w j + Z ∞ κw j (cid:16) xκ − w j (cid:17) dF X ( x ) , (14)or, equivalently, κw j ≥ g j ( κw j ) , (15)where g j ( v ) = (cid:2) β − − ( j + 1) (cid:3) Z ∞ v ( x − v ) dF X ( x ) . (16)It is not hard to check that: (i) g j ( v ) > for v ≥ ; (ii) g j ( v ) is a strictly decreasing function of v ; (iii) lim v →∞ g j ( v ) = 0 . Then inequality (15) is equivalent to κw j ≥ v j , where v j is the crosspoint of function f ( v ) = v and g j ( v ) . Also, we have g j ( v ) > g j +1 ( v ) . Then it is easy to verifythat v j +1 < v j . The solution to (15) is illustrated in Fig. 2.By observing the structure of (16), it is worth pointing out that the cross point v j is onlydetermined by j , β and the CDF F X ( · ) , i.e., the unit mean valued random variable X j . Andthe value of v j is independent of the number of users K , the mean rates of all users r k as wellas the achieved throughput to mean-rate ratio κ . Hence if the transmitter knows the distribution F X ( · ) , it can compute v j in advance.Now inequality (12) can be expressed as w j ≥ κ v j for ≤ j ≤ J max − , which is alsoequivalent to the inequality in (8) in the steady-state case. Thus the decision on whether to keepprobing or to start transmitting is decided by a static threshold criteria. For completeness, let September 24, 2018 DRAFT1 v J max = 0 in order to make sure the probing can always be terminated in each slot. We get thefollowing static threshold based probing criteria, which can replace the line 9 in Algorithm 1. Criteria 1:
After probing j users, if the current value of the largest normalized rate w j ≥ κ v j ,then the transmitter transmits to the user with the largest normalized rate; otherwise it probesthe ( j + 1) st user.In practice, the scheduler can calculate v j in advance but κ is unavailable at the beginning.One way to estimate κ is to start the joint probing and scheduling using the dynamic criteria inline 9 of Algorithm 1. After a period of time, the throughput approaches to its steady-state value.Then the throughput to mean-rate ratio κ is obtained and the static threshold criteria can be usedthereafter. Alternatively, κ can be determined theoretically as discussed in the next subsection. D. The Scheduling Gain
In this section we analyze the performance of the proposed scheme theoretically. We definethe scheduling gain as the ratio of the achieved throughput to that using round robin schedulingwithout probing, which reflects how much multiuser diversity benefits can be exploited. Thescheduling gain of the proposed joint probing and scheduling scheme is T ∗ k K − r k = κK. For arandom variable X , let us denote the truncation of X over [ a, b ] as [ X ] ba . Note that E [ X | a ≤ X ≤ b ] = E [ X ] ba . Theorem 4:
Under the homogeneous rate assumption (A1), the scheduling gain of Algorithm1 is κK = J max X j =1 (cid:2) ( F X ( v j − )) j − − ( F X ( v j )) j (cid:3) (1 − jβ ) E n(cid:2) max (cid:0) [ X ] v j − , · · · , [ X j − ] v j − , X j (cid:1)(cid:3) ∞ v j o , where v j is the solution of v = g j ( v ) .Recall that J ∗ is the optimal stopping time, that is, the number of users probed before a useris scheduled. We prove Theorem 4 using the following supporting lemma. Lemma 1:
Using Algorithm 1, the steady-state probability of the event that j users are probeduntil transmission is given by p j = ( F X ( v j − )) j − − ( F X ( v j )) j , ≤ j J max . (17) September 24, 2018 DRAFT2
Proof:
At steady state, all users’ throughput-normalized mean rates r k /T ∗ k are essentiallyidentical. Let q j = Pr { J ∗ ≥ j } , i.e., the probability that at least j users are probed beforetransmission. Then q = 1 . And from Criteria 1, we have for j ≥ , q j = Pr { max( X , · · · , X j − ) < v j − } = Pr { X < v j − } · · · Pr { X j − < v j − } = ( F X ( v j − )) j − . Like v j , q j is also completely determined by the rate distribution. Clearly, p j = q j − q j +1 for j ≤ J max − and p J max = q J max . Proof of Theorem 4:
Consider a specific user k . In the steady state, ˙ T ( t ) = 0 . Then fromTheorem 2, user k ’s throughput is given by T ∗ k = E [ B k ( n ) | T ∗ ] . Throughout, let K ∗ denoteindex of the user that is selected as destination. Then event { K ∗ = k } , i.e., user k is selected asdestination, can be decomposed into J max exclusive sub events: { K ∗ = k } = S j =1 , ··· ,J max { K ∗ = September 24, 2018 DRAFT3 k, J ∗ = j } . Then we have T ∗ k = E [ B k ( n ) | T ∗ ] = E [(1 − J ∗ β ) R k I k ]= Pr { K ∗ = k } E [(1 − J ∗ β ) R k | K ∗ = k ] ( a ) = 1 K E [(1 − J ∗ β ) R k | K ∗ = k ] ( b ) = 1 K J max X j =1 Pr { J ∗ = j } E [(1 − jβ ) R k | K ∗ = k, J ∗ = j ]= T ∗ k K J max X j =1 p j (1 − jβ ) E (cid:20) R k T ∗ k (cid:12)(cid:12)(cid:12)(cid:12) K ∗ = k, J ∗ = j (cid:21) ( c ) = T ∗ k K J max X j =1 p j (1 − jβ ) E " max (cid:20) R T ∗ (cid:21) vj − κ , · · · , (cid:20) R j − T ∗ j − (cid:21) vj − κ , R j T ∗ j ! ∞ vjκ ( d ) = T ∗ k K J max X j =1 p j (1 − jβ ) E " max (cid:20) X κ (cid:21) vj − κ , · · · , (cid:20) X j − κ (cid:21) vj − κ , X j κ ! ∞ vjκ ( e ) = T ∗ k κK J max X j =1 p j (1 − jβ ) E n(cid:2) max (cid:0) [ X ] v j − , · · · , [ X j − ] v j − , X j (cid:1)(cid:3) ∞ v j o , where (a) follows from Corollary 1, (b) from the law of total probability, (c) from the staticthreshold criteria, that is, { K ∗ = k, J ∗ = j } means that: i) user k has the largest throughput-normalized rate among the first j users; ii) the first j − users’ throughput-normalized ratesare smaller than κ − v j − and iii) the largest value of the first j users’ throughput-normalizedrates is larger than κ − v j , (d) from R k = r k X k and (9), and (e) from the distribution of X j . Byreplacing p j with (17) and removing T ∗ k from both sides, the conclusion of Theorem 4 holds. (cid:4) IV. J
OINT L EARNING , P
ROBING AND S CHEDULING
Consider the case where the scheduler does not know a priori the statistics of the quality ofthe downlink channels, and thus has to rely on the history of the probed CQI to decide on theuser probing order and user selection. Under this assumption, the problem of maximizing the PFutility function is a generalization of the classical multiarmed bandit problem [20]. The problemis a generalization because in the classical bandit problem, the decision maker has to decide
September 24, 2018 DRAFT4 which of K random process to observe in a sequential of trials so as to maximize the reward,where the ‘observing’ operation is equivalent to the ‘utilizing’ operation. However, in our model,in each slot, the scheduler may probe (observe) more than one channels (random processes) andthen choose only one for transmission (utilization). The observation does not always lead to autilization.At the beginning of slot n , i.e., the end of slot n − , let M k ( n − denote the number of timeslots in which the channel to user k has been probed, and R k ( n −
1) = { R (1) k , · · · , R ( M k ( n − k } record all the probed samples of the channel rate of user k . Clearly, the cardinality |R k ( n − | = M k ( n − . The scheduler keeps updating the K sets [ R ( n ) , · · · , R K ( n )] from slot to slot. Also,the scheduler knows the throughput T ( n − till the previous slot. The objective is still to finda scheme that solves the stopping problem in each slot. As analyzed in Section III-A, there stillexists the same two tasks to find the optimal scheme: determining the user probing order andselecting one user for transmission. Hence the problem formulation and scheme design is similarto those in Section III-A. The only difference is that the scheduler just has the sampled values ofall channels’ rates instead of the explicit knowledge of the distribution of R k , ( k = 1 , · · · , K ) ,which means that we cannot calculate the expectations related to R k directly. Alternatively, wecan only evaluate the empirical average using the acquired samples of R k , which readily leadsto the index-based policy solution in the framework of bandit problem.The index policy, consisting of choosing at any time the stochastic process with the currentlyhighest index, is the solution to a class of bandit problems. Here to find the optimal scheme,we adopt the similar methodology as in the development of the index-based policy by Agrawalin [21]. For the decision on the user probing order, we use the current average reward, i.e., thethroughput-normalized average rate as the index. For the decision on when to start transmission,we adopt the actually served bits in current slot, i.e., the product of − jβ and the conditionalthroughput-normalized-average rate. For the convenience of presenting the algorithm, we define September 24, 2018 DRAFT5 the following two empirical averages ˜ s k ( n ) , M k ( n − M k ( n − X m =1 R ( m ) k T k ( n − , (18) ˜ e k ( n, w ) , M k ( n − M k ( n − X m =1 " w ∨ R ( m ) k T k ( n − . (19)The ˜ s k ( n ) is used to replace the ¯ s k ( n ) in Algorithm 1 and the ˜ e k ( n, w ) is for E h w ∨ R k T k ( n − i inAlgorithm 1. Then a joint PF learning, probing and scheduling (JLPS-PF) algorithm is describedin Algorithm 2. Algorithm 2:
JLPS-PF Initialization: n ← ⌈ βK ⌉ . For k = 1 , · · · , K , T k ( n ) ← . In the first n slots, sequentiallyprobe each channel once, making sure that each one of the sets R k ( n ) , ( k = 1 , · · · , K ) isnot empty. M k ( n ) ← ; for n = ⌈ βK ⌉ + 1 , ⌈ βK ⌉ + 2 , · · · do ˜ s k ( n ) ← M k ( n − M k ( n − P m =1 R ( m ) k /T k ( n − . Sort ˜ s k ( n )( k = 1 , · · · , K ) in the descendingorder: ˜ s k ( n ) ≥ · · · ≥ ˜ s k K ( n ) ; j ← , w ← ; do j ← j + 1 ; Probe user k j and get the rate R k j ( n ) ; w ← w ∨ R k j ( n ) /T k j ( n − ; ˜ e k j +1 ( n, w ) ← M kj +1 ( n ) M kj +1 ( n ) P m =1 (cid:20) w ∨ R ( m ) kj +1 T kj +1 ( n − (cid:21) ; R k j ( n ) ← R k j ( n − ∪ { R k j ( n ) } , M k j ( n ) ← M k j ( n −
1) + 1 ; while (1 − jβ ) w < (1 − ( j + 1) β )˜ e k j +1 ( n, w ) ; Transmit to user k j . Update T ( n ) ; For k = k j + 1 , · · · , k K , R k ( n ) ← R k ( n − , M k ( n ) ← M k ( n − ; end From the description of Algorithm 2, one may wonder such a phenomenon may exist that if
September 24, 2018 DRAFT6 one user is probed with relatively high values in the first few slots, then it will have low priorityof being probed afterwards, resulting that the ensemble average of this channel is always higherthan its statistical expectation. However, this does not happen thanks to the structure of thealgorithm derived from the objective of maximizing the PF utility. As a matter of fact, if user k is probed and selected less frequently compared to other users, the achieved throughput T k ( n ) will become small, which will in return increase its priority of being probed and selected. Infact, the metric of throughput-normalized rate used in PF scheduling is a well-balanced rule thatguarantees each user is sampled with sufficiently many times and identical frequencies. Henceafter the Algorithm 2 runs a a sufficiently long time, the sampled data of each user’s channelrate can characterize the statistics of R well. Then from the law of large number, the ensembleaverage converges to the statistical expectation. And the performance of Algorithm 2 is almostthe same as that of Algorithm 1. V. N UMERICAL R ESULTS
In this section, we provide some numerical experiments illustrating the theoretical findings ofthe previous sections. Our objectives here are (i) to evaluate the performance of the developedschemes with and without channel statistics; (ii) to compare the developed scheme for achievingPF with some ideal and practical schemes and to quantify the impact of the cost of CQI on thescheduling. We consider the scenario where users’ rates obey the exponential distributions withaverage equal to the user index. The exponential rate assumption is an appropriate approximationof the Shannon capacity under Rayleigh fading channels in low SNR regime.
A. Evaluation of the Proposed Algorithms
Consider K = 20 users and let the fraction of one probe be β = 0 . . Up to J max = 10 userscan be probed in each slot.Fig. 3 presents a sample throughput trajectory of user 1 when scheduled with Algorithm 1, thestatic threshold criteria given in criteria 1 and Algorithm 2. The simulation runs for , slots September 24, 2018 DRAFT7 in this experiment. The time axis is in logarithmic scale to highlight the transient behavior. Wecan see that the static threshold criteria works well. The variation of the throughput diminishesover time as more and more time slots are included in the averaging. It is worth noting that thelow complexity of the static threshold criteria for solving the optimal stopping problem comesfrom the explicit knowledge of the channel statistics. If this information is not known, or if thedistribution of the channel rate varies over time, we can only adopt the dynamic criteria givenin Algorithm 1.Fig. 4 illustrates the frequency of each user being scheduled in a relatively short period of2000 slots. Each of the 20 user is selected as the destination for roughly 100 slots. That is, thescheme is fair to all users even within a small application time window.Fig. 5 presents the probability that k users have been probed until transmission. The theoreticalresults are from Lemma 1. The figure shows that both the Algorithm 1 and Algorithm 2 coincidewith the theoretical results. We observe from the figure that the probability decreases sharply asthe probing step approaches J max .Fig. 6 plots the scheduling gain of the proposed algorithms versus the number of users in thesystem. The simulation runs for 20,000 slots. In fact the simulation result matches the analyticalresult of Theorem 4 quite well. Also, we note the scheduling gain remains about the same formore than 9 users. Because at this time, the cost of user probing is dominant and the schemealways tries to carry out the user probing till the end. B. Comparison between the Proposed Scheme and Other Schemes
The fraction of slot for probing one user is still set β = 0 . . Here four schemes are considered:(a) the proposed joint probing and scheduling scheme; (b) Round robin scheduling; (c) Genie-aided PF (GA-PF) scheme where full CQI is available to the scheduler at the beginning of eachslot; (d) Probe-all PF (PA-PF) scheme where the transmitter probes all users before scheduling.For both (c) and (d), the transmitter selects the user with the largest R k ( n ) /T k ( n − fortransmission. From [22] we know that the scheduling gain of GA-PF is E (cid:20) max k =1 , ··· ,K X k (cid:21) . Then September 24, 2018 DRAFT8 that of PA-PF is max(1 − Kβ, E (cid:20) max k =1 , ··· ,K X k (cid:21) .Fig. 7 presents the scheduling gain of schemes (a)-(d) as a function of the number of users.We can see from Fig. 7 that when probing cost is taken into account, the scheduling gain doesnot always increase but approaches to a limit value as the number of users increases. Thisindicates that, by ignoring the cost of channel probing, the ideal genie-aided PF does not reflectthe correct multiuser diversity characteristics. The comparison also shows the advantage of theproposed joint probing and scheduling scheme. For the probe-all PF scheme, it achieves highergain than round robin when the user population is not very large compared with β − . However,when the number of user increases to some extent, the scheduling gain of probe-all algorithmvanishes. That is because almost all the period of one slot is used for user-probing instead ofdata transmission.Fig. 8 displays the sum throughput of all schemes as the number of users increases. Onecan see that there exists a relative large gap between the ideal genie-aided PF curve and theproposed scheme. The gap quantifies the the extent to which the user probing decreases thesystem performance. For example, when the number of users is K = 20 , the throughput of thejoint probing and scheduling scheme only accounts for 55.64% of that of the genie-aided PF.And the throughput achieved by the joint scheme is the highest among all the non-ideal schemes(a), (b) and (d). The probe-all PF scheme performs similar to the joint probing and schedulingscheme when there are not many users ( K ≤ ), but degrades fast and even vanishes when thenumber of users becomes large. VI. C ONCLUSION
We have studied the problem of achieving proportional fairness in wireless systems whenexplicitly taking into account the channel probing cost. An optimal adaptive joint probing andscheduling scheme is presented, as well as a static threshold based criteria for determiningwhether to probe or to transmit. Using the steady-state analysis, we have evaluated the schedulinggain explicitly. Extension of the scheme to the case in which the scheduler has no knowledge of
September 24, 2018 DRAFT9 the channel rate distribution has been developed, which achieves almost the same performance ofthe algorithm obtained under known rate statistics assumption and outperforms other non-idealPF schemes. In this work, we have focused on the well-studied proportional fairness rule. It ispossible to extend the results to more general utilities, for example, the α fair utility [7]. Themethodology presented in this paper can then be carried through to that case as well.A PPENDIX AP ROOF OF T HEOREM Proof:
Let the largest throughput-normalized user rate after probing j users be denoted by w j = max ≤ l ≤ j s k ( l ) (20)Then the current reward can be written as y j ( s k , · · · , s k j ) = (1 − jβ ) w j and the expected rewardobtained from probing the next user is E [ y j +1 ( s k , · · · , s k j +1 ) | s k , · · · , s k j ] = (1 − ( j + 1) β ) E [ w j ∨ s k j +1 | w j ] . (21)Then the event E j can be expressed as E j = { (1 − jβ ) w j ≥ (1 − ( j + 1) β ) E [ w j ∨ s k j +1 | w j ] } . (22)We first show that there exists a threshold w ( th ) j such that the event E j can be represented as E j = { w j ≥ w ( th ) j } . To this end, let f j ( w ) = (1 − jβ ) w − (1 − ( j + 1) β ) E [ w ∨ s k j +1 ] . Then w ∈ E j ⇔ f j ( w ) ≥ . It is easy to verify that f j (0) < and f j ( ∞ ) > . The function f j ( w ) can be reorganized as f j ( w ) = β E [ w ∨ s k j +1 ] + (1 − jβ ) E [ w − w ∨ s k j +1 ] . For any w ′ > w > , f j ( w ′ ) − f j ( w ) = β E [ w ′ ∨ s k j +1 − w ∨ s k j +1 ] + (1 − jβ ) E [ w ′ − w + w ′ ∨ s k j +1 − w ∨ s k j +1 ] . Note that w ′ ∨ s k j +1 ≥ w ∨ s k j +1 and w ′ − w ≥ w ′ ∨ s k j +1 − w ∨ s k j +1 . Thus f j ( w ′ ) − f j ( w ) ≥ ,that is, f j ( w ) is a nondecreasing function. Summarizing the properties of f j ( w ) , it can be seenthat the solution to f j ( w ) ≥ can be expressed as w ≥ w ( th ) j . September 24, 2018 DRAFT0
We next show that w ( th ) j +1 ≤ w ( th ) j . For fixed w , f j +1 ( w ) − f j ( w )=(1 − ( j + 1) β ) w − (1 − ( j + 2) β ) E s kj +2 [ w ∨ s k j +2 ] − (1 − jβ ) w + (1 − ( j + 1) β ) E [ w ∨ s k j +1 ]= β E s kj +2 [ w ∨ s k j +2 − w ] + (1 − ( j + 1) β ) { E [ w ∨ s k j +1 ] − E s kj +2 [ w ∨ s k j +2 ] }≥ . (23)where the last ‘ ≥ ’ follows from the fact that s k j +1 and s k j +2 are of the same type of distributionand E s k j +1 ≥ E s k j +2 . Note that w ( th ) j is the zero point of the function f j ( w ) . Hence w ( th ) j +1 ≤ w ( th ) j ,as illustrated in Fig. 1.Collecting the preceding results, we have E j = { w j ≥ w ( th ) j } ⊆ { w j +1 ≥ w ( th ) j } ⊆ { w j +1 ≥ w ( th ) j +1 } = E j +1 . R EFERENCES [1] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,”
IEEE/ACM Trans. Netw. , vol. 8, no. 5, pp.556-567, Oct. 2000.[2] P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,”
IEEE Trans. Inf. Theory. ,vol. 48, no. 6, pp. 1277-1294, June 2002.[3] F. P. Kelly, “Charging and rate control for elastic traffic”,
Euro. Trans. Telecommun. , vol 8, pp. 7-20, 1997.[4] H. J. Kushner and P. A. Whiting, “Convergence of proportional-fair sharing algorithms under general conditions”,
IEEETrans. Wireless Commun. , vol. 3, no. 4. pp. 1250-1259, July 2004.[5] S. Borst and M. Jonckheere, “Flow-level stability of channel-aware scheduling algorithms”, in
Proc. WiOpt 06 , 2006.[6] J. G. Choi and S. Bahk, “Cell-throughput analysis of the proportional fair scheduler in the single-cell environment”,
IEEETrans. Veh. Technol. , vol. 56, no. 2, pp. 766 - 778, 2007.[7] J. Liu, A. Proutiere, Y. Yi, M. Chiang and H. V. Poor, “Stability, fairness, and performance: A flow-level study on nonconvexand time-varying rate regions”,
IEEE Trans. Inf. Theory. , vol. 55, no. 8, pp. 3437 - 3456, 2009.[8] C. W. Chan and N. Bambos, “Throughput loss in task scheduling due to server state uncertainty,”
VALUETOOLS
ACMMOBICOM , 2007.[11] P. Chaporkar and A. Proutiere, “Optimal joint probing and transmission strategy for maximizing throughput in wirelesssystems,”
IEEE Journal on Selected Areas in Communications, vol. 26, no.8, pp. 1546-1556, 2008.[12] J. Chen, R. A. Berry, and M. L. Honig, “An adaptive limited feedback scheme for MIMO OFDM based on optimalstopping”,
Proc. Allerton Conference , 2008.
September 24, 2018 DRAFT1 [13] A. Gopalan, C. Caramanis and S. Shakkotai, “On wireless scheduling with partial channel-state information,” in
AllertonConference on Communication, Control, and Computing , 2007.[14] M. Ouyang and L. Ying, “On scheduling in multi-channel wireless downlink networks with limited feedback”, In
Proc.Allerton Conference , 2009.[15] P. Chaporkar, A. Proutiere, H. Asnani and A. Karandikar, “Scheduling with limited information in wireless systems”, in
ACM MobiHoc , 2009.[16] H. Zhou, P. Fan and D. Guo, “The Impact of Limited Information on Proportional Fair Scheduling In Wireless Networks”,accepted by IEEE GLOBECOM 2010.[17] A. L. Stolyar, “On the asymptotic optimality of the gradient scheduling algorithm for multiuser throughput allocation”,
Opreations Research
Stochastic Approximation: A Dynamical Systems Viewpoint.
Cambridge University Press, 2008.[20] D. A. Berry and B. Fristedt,
Bandit Problems: Sequential Allocation of Experiments , London: Chapman and Hall, 1985.[21] R. Agrawal, “Sample mean based index policies with O(log n) regret for the multi-armed bandit problem,”
Advances inApplied Probability , Vol. 27, No. 4, pp. 1054-1078, 1995.[22] S. Borst, “User level aware performance of channel-aware scheduling algorithms in wireless data networks,” in
Proc.Infocom, ( )1 thj w w ( ) j f w ( ) j f w ( ) thj w Fig. 1: Illustration of the property of function f j ( w ) . September 24, 2018 DRAFT2 g j+1 ( v ) g j ( v ) j v j v ( 1) j ! " " ( 2) j ! " " v ( ) f v v Fig. 2: Illustration of the solution to inequality (15).
September 24, 2018 DRAFT3 Time slot T h r oughpu t o f u s e r Joint probing and scheduling, Algorithm 1Joint probing and scheduling, Static threshold criteriaJoint learning, probing and scheduling, Algorithm 2
Fig. 3: The throughput trajectory of user 1 when scheduled with Algorithm 1, the static thresholdcriteria and Algorithm 2 respectively. N slot = 10 , , K = 20 , β = 0 . . September 24, 2018 DRAFT4
User index N u m be r o f be i ng s e l e c t ed a s de s t i na t i on Joint probing and scheduling, SimulationJoint learning, probing and scheduling, SimulationJoint probing and scheduling, Theory
Fig. 4: The number of slots in which each user is selected as the destination. N slot = 2000 , K =20 , β = 0 . .Fig. 5: The probability that k users have been probed until transmission. K = 20 , β = 0 . . September 24, 2018 DRAFT5 S c hedu li ng G a i n Joint probing and scheduling, TheoryJoint probing and scheduling, SimulationJoint learning, probing and scheduling, Simulation
Fig. 6: The scheduling gain comparison between Algorithm 1, Algorithm 2 and theoretical results. β = 0 . . September 24, 2018 DRAFT6 S c hedu li ng G a i n Genie aided PFJoint probing and scheduling PFProbe−all PFRound robin
Fig. 7: Scheduling gain VS number of users. β = 0 . . S u m T h r oughpu t Genie aided PFJoint probing and scheduling PFProbe−all PFRound robin
Fig. 8: Sum throughput VS number of users. β = 0 . ..