[PDF] Exploiting Residual Resources to Support High Throughput with Resource Allocation

Abstract

Residual radio resources are abundant in wireless networks due to dynamic traffic load, which can be exploited to support high throughput for serving non-real-time (NRT) traffic. In this paper, we investigate how to achieve this by resource allocation with predicted time-average rate, which can be obtained from predicted average residual bandwidth after serving real-time traffic and predicted average channel gains of NRT mobile users. We show the connection between the statistics of their prediction errors. We formulate an optimization problem to make a resource allocation plan within a prediction window for NRT users that randomly initiate requests, which aims to fully use residual resources with ensured quality of service (QoS). To show the benefit of knowing the contents to be requested and the request arrival time in advance, we consider two types of NRT services, video on demand and video on reservation. The optimal solution is obtained, and an online policy is developed that can transmit according to the plan after instantaneous channel gains are available. Simulation and numerical results validate our analysis and show a dramatic gain of the proposed method in supporting high arrival rate of NRT requests with given tolerance on QoS.

Full PDF

EExploiting Residual Resources to Support HighThroughput with Resource Allocation

Jia Guo, Chuting Yao, Chenyang Yang and Zixiang Xiong

Abstract

Residual radio resources are abundant in wireless networks due to dynamic trafﬁc load, whichcan be exploited to support high throughput for serving non-real-time (NRT) trafﬁc. In this paper, weinvestigate how to achieve this by resource allocation with predicted time-average rate, which can beobtained from predicted average residual bandwidth after serving real-time trafﬁc and predicted averagechannel gains of NRT mobile users. We show the connection between the statistics of their predictionerrors. We formulate an optimization problem to make a resource allocation plan within a predictionwindow for NRT users that randomly initiate requests, which aims to fully use residual resources withensured quality of service (QoS). To show the beneﬁt of knowing the contents to be requested and therequest arrival time in advance, we consider two types of NRT services, video on demand and videoon reservation. The optimal solution is obtained, and an online policy is developed that can transmitaccording to the plan after instantaneous channel gains are available. Simulation and numerical resultsvalidate our analysis and show a dramatic gain of the proposed method in supporting high arrival rateof NRT requests with given tolerance on QoS.

Index Terms

Predictive resource allocation, residual resource, high throughput, quality of service

I. I

NTRODUCTION

To support the explosively growing trafﬁc demands, various new techniques are underinvestigation for the ﬁfth generation cellular networks and beyond [1]. One of the main trends iscontinuing to provide higher spectral efﬁciency (SE), say by densifying the networks with morebase stations (BSs) or more antennas. While further improving network SE is always beneﬁcial,it has long been observed that the network resources are highly under-utilized [2]. It has beenrecently observed from prevalent networks that in average less than 15% resource blocks aretruly used in practice. One reason behind such a dilemma is the temporal-spatial variation oftrafﬁc load, i.e., only some BSs are busy during peak time of each day.

March 29, 2018 DRAFT a r X i v : . [ c s . I T ] M a r The dynamic nature of wireless trafﬁc comes from user behavior, hence the trafﬁc variationcan be explored to boost network throughput by predicting the behavior. While indeed random,human behavior exhibits strong regularity due to routine activity, as reported by big dataanalysis in a variety of disciplines [3]–[7]. This implies the predictability of behavior-relatedinformation, either collectively or individually. For example, the trafﬁc volume and user trajectoryare predictable [5], [8], [9], from which future average resource usage status of a network andaverage channel gains of a user (with the help of a radio map [10], [11]) can be derived [12],[13], and user preference can be predicted by machine learning such as collaborative ﬁltering [6],from which the probability of a user requesting a content can be obtained. As a consequence,predictive resource allocation is becoming one possible way to exploit residual resources [13]–[16], which is applicable for both real-time (RT) and non-real-time (NRT) services [7].

A. Related Works

For RT trafﬁc such as phone calls, predictive wireless access has been extensively investigatedto improve the admission-level quality of service (QoS), say reducing the call dropping ratesduring handover among adjacent cells [17]. Considering that the information bits are generatedrandomly by each user and the RT service is with high priority, the major mechanism is to reserveresources for the RT trafﬁc. Mobility prediction has long been used for mobility managementto assist handover and for other location-based services, where the prediction granularity is incell level or even more coarse (say, the next location) [7], [18]. With the predicted next-cellconnection and hand-off time, dynamical resource reservation and call admission control can beused to improve the QoS [18], [19].For NRT trafﬁc such as video on demand (VoD) or ﬁle downloading, not only the admission-level and packet-level QoS of each user but also the performance of a network can be improvedby exploring future information. This is because the videos or ﬁles to be transmitted is cacheablemeanwhile the delay requirement of NRT trafﬁc is not so stringent. As a result, the videos can bepre-buffered at a mobile station (MS) when the MS is with good channel condition [15] and/oris located in a cell with light trafﬁc load [12], [16] (i.e., can be served with higher data rate[14]). In contrast to non-predictive resource allocation that allocates radio resources at each timeslot when instantaneous channel gain is available, predictive resource allocation makes a planfor assigning future resources in a prediction window at the start of the window when predicted

March 29, 2018 DRAFT information is available. The plan determines which BSs along the trajectory of a MS will servethe MS in which time slots with how much resources (say bandwidth).Assuming that future instantaneous data rate in the prediction window is known, a resourceallocation plan was optimized in [15] to maximize the sum rate over the window, and a plan wasmade in [13] to minimize the power consumption at BSs without causing stalling for VoD users.Because the instantaneous rate is hard to predict, a more realistic assumption is knowing the ratestatistics in the future, say average data rate [14] or data rate distribution [20]. Noticing that therate prediction is inevitably inaccurate even in average, a robust predictive resource allocationwas proposed in [21], where the prediction errors on future rates are modelled as Gaussian noise.

B. Motivation and Contributions

All existing works implicitly assume that multiple NRT users initiate their requests simul-taneously at the start of a prediction window. This assumption implies that the content to berequested and the exact request arrival time are known in advance, because the request arrivalsare random and highly asynchronous in practice. However, only the probability of a contentto be requested is predictable [6] and the exact request arrival time is hard to predict if notimpossible. As a consequence, it is unreasonable to assume knowing all future NRT requestarrivals, unless the NRT users make reservations before truly requesting the videos or ﬁles asin video on reservation (VoR) [22].Besides, most priori research efforts assume that the future data rate is perfectly available orknown with some statistics of prediction errors, but rarely address how the rate is predicted orhow the error statistics are connected with the errors of predictable information.Moreover, the time-varying rate is assumed only coming from large scale channel variationdue to user mobility. This assumption implies that all radio resources can be used for NRT users.However, both RT and NRT requests may arrive in a cell, where the requests of RT users needto be served with higher priority and the requests of NRT users can be served with the residualresources after serving RT trafﬁc. Therefore, the average rate of a NRT user depends not onlyon the trajectory but also on the variation of trafﬁc load. This fact is largely overlooked in theliterature of predictive resource allocation.In this paper, we strive to demonstrate the performance gain of predictive resource allocation insupporting high throughput. To show the gain in real world networks, the request arrivals of NRTusers are no longer assumed as synchronous. To show the beneﬁt from knowing the contents to

March 29, 2018 DRAFT be conveyed and the request arrival time in advance, we consider two types of NRT services, VoDor VoR. We assume that average channel gains and average residual bandwidth are predictablefrom the trafﬁc load and user trajectory prediction, by using the methods in [12], [13], withwhich the average rate prediction can be derived. Since predicting user behavior is not an easytask, we show how the prediction errors of average rate are translated from those of predictedaverage channel gains and average residual bandwidth, and when it can be modelled as Gaussianas assumed in [21]. Such analysis can help understand the gain from predicting different kinds ofinformation and the required prediction accuracy to achieve the gain, which provides guidancefor behavior prediction and facilitates robust optimization for predictive resource allocation.The major contributions of this work are summarized as follows: • We show the connection of the statistics of errors between the predicted average rate and thepredicted average residual bandwidth and average channel gain, by resorting to the principleof maximum entropy. We ﬁnd that the prediction error of average rate mainly depends onthe prediction error of average residual bandwidth, which implies that the user trajectoryare unnecessary to be predicted accurately. • We formulate a problem to optimize resource allocation plan for randomly arrived NRTusers that can exploit network residual resources in a prediction window. To maximize therequest arrival rate of the NRT users that the network can support and accommodate theuncertainty of requested content and request arrival time within the window, we minimize aweighted total transmission time with ensured maximal waiting time of the NRT users. Wedemonstrate the gain of the obtained optimal solution over priori solutions for predictiveresource allocation by simulations.

Notations: (cid:107) · (cid:107) denotes Euclidean norm, and | · | denotes magnitude, E {·} and D {·} denoteexpectation and variance, N ( · ) and U ( · ) denote Gaussian and uniform distributions, respectively.The rest of the paper is organized as follows. In section II, we introduce channel andtransmission models as well as a general trafﬁc model with randomly arrived NRT requests.In section III, we analyze the prediction error statistics of average rate, formulate the resourceallocation planning optimization problem, and ﬁnd the optimal solution. In section IV, atransmission policy according to the plan is provided. Simulation and numerical results areshown in section V, and the paper is concluded in section VI. March 29, 2018 DRAFT

II. S

YSTEM M ODEL

Consider a N b cell network, where each BS is equipped with N t antennas, and serves twokinds of trafﬁc with bandwidth W max and transmit power P max . The ﬁrst kind is RT trafﬁc, andthe other is NRT trafﬁc. Because RT trafﬁc has higher priority, the NRT trafﬁc can be served bythe residual resources of the network after the QoS of RT trafﬁc is guaranteed. Given dynamictrafﬁc load of RT service, the residual resources available for NRT service is time-varying. Forthe MSs that request NRT trafﬁc, we call them NRT users or simply MSs in the sequel.Assume that there is a central processor (CP) in the network, which makes the resourceallocation plan for serving the NRT users within a prediction window. A. Trafﬁc and Channel Models

The requests of NRT users arrive at the network randomly and asynchronously. Each MSrequests a video, either on-demand (i.e., VoD) or on reservation (i.e., VoR). For a MS demandingVoD service (called VoD MS), the CP can make the plan for resource allocation at the momentof the MS initiating its request. For the MS demanding VoR service (called VoR MS), the CPmakes the plan at the moment of the MS making the reservation, which is earlier than the timeinstant that the MS starts to play the video. A video ﬁle is divided into multiple segments andthen coded. Each segment is a stand-alone unit. Once a segment is completely received by aMS, it can be decoded and played out. To avoid playback interruption due to empty playoutbuffers, a segment should be conveyed to the MS before the end of playing previous segment.Time is discretized into frames each with duration ∆ , and each frame includes T s time slots,each with duration of unit time (say 1 ms). The durations are deﬁned according to the variationof large scale channel fading (including path-loss and shadowing) and small scale fading dueto user mobility, respectively. Assume that the large scale channel gain (also called averagechannel gain) remains constant within each frame and may vary among frames, and the smallscale channel gain (i.e., instantaneous channel gain, also called channel state information (CSI) inliterature) remains constant within each time slot and varies among time slots with independentand identically distribution (i.i.d.). For notational simplicity, we set the duration of the predictionwindow as T f frames and the playback duration for each segment as T seg frames, and we assumethat each segment contains B bits and each segment needs to play at the beginning of a frame.For the network only with VoD trafﬁc in addition to RT trafﬁc, we set the request arrivaltime of the K th MS (denoted as MS K ) as the start time of a prediction window, deﬁned as the March 29, 2018 DRAFT ﬁrst time slot in the ﬁrst frame (called reference time for short). To reﬂect the random natureof the request arrivals, we consider the realistic scenario where K − VoD MSs are playingvideos at the reference time, as shown in Fig. 1(a). This means that the prediction window isupdated every time a new MS initiates a request. Within the window, new VoD MSs may initiaterequests, whose arrival time is unknown at the reference time.Denote the waiting time for MS k from the moment of sending a video request to the momentof starting to play the video as T w ,k frames, which reﬂects the initial delay. For VoD MSs, wecan set T w ,k ∆ as a constant duration that is long enough for downloading the ﬁrst segment of avideo (such as the advertisement time before the video being played). For the video requestedby MS k who is playing a segment at the reference time (denoted as Seg k ), N k segments havenot been played and wait to be downloaded within the window. Denote the duration between thereference time and the moment of the ﬁrst segment of MS k to be played in the window (denotedas Seg k ) as T k ∆ . For the k th ( k = 1 , . . . , K − ) VoD MSs, T k ∆ is the residual playbackduration of Seg k , for the K th VoD MS, T k ∆ is equal to its initial delay T w ,K ∆ . Denote themaximal waiting time a MS expected to watch the total video as T mw ∆ , which is the sum ofthe initial delay and overall stalling time during playback [23]. Then, ( T mw − T w ,k )∆ is the totalstalling time allowed by MS k , and hence [ T mw − T w ,k + T k + ( n − T seg ]∆ is the deadline fortransmitting Seg nk , n = 1 , · · · , N k without making MS k unsatisﬁed. Without loss of generality,assume that T seg + T mw ≤ T f .For the network only with VoR trafﬁc in addition to RT trafﬁc, we set the time instant thatMS K makes the reservation as the start time of the prediction window. At this moment, K − VoR MSs have already made the reservation, as shown in Fig. 1(b). For VoR MSs, the initialdelay is , i.e., T w ,k = 0 , k = 1 , . . . , K . B. Transmission Model

To exploit residual resource, only the MS with highest average channel gain is associatedwith a BS, who serves the MS with all residual bandwidth and transmit power. Accordingto the resource allocation plan, there may exist multiple NRT users in each cell that should beserved simultaneously. To avoid multi-user interference, various multiple access techniques can beapplied. For easy exposition, we consider time division multiple access, i.e., these MSs are servedin different time slots. Then, maximal ratio transmission (MRT) is the optimal beamforming and

March 29, 2018 DRAFT MS MS k MS K Seg kN k … … … Seg k1 Seg k0 T w,K △= T K1 △ Seg K1 Seg KN K … Seg Kn … T k1 △ T seg △ Playback time of the first segment … … T s time slot s T seg frames … … Reference time VoD Prediction window: T f frames Seg Seg … T seg △ … Segments that have been played … Seg (a) VoD trafﬁc request model: MS K initiates a request at the start of a prediction window, i.e., the reference time. MS MS K … VoR MS ,…MS K-1 reserved before reference time MS K reserves at reference time MS k … T △ Seg Seg N Seg n … … T K1 △ Seg K1 Seg KN K Seg Kn … … … T k1 △ Seg k1 Seg kN k Seg kn … … Reference time Prediction window: T f frames … (b) VoR trafﬁc request model: MS K makes a reservation at the start of a prediction window, and begins to play Seg K after a duration of T K ∆ .Fig. 1. Random request arrival model of NRT users. We set the request arrival time or reservation time of MS K as the referencetime. Before the reference time, MS · · · MS K − have sent requests or make reservations. After the reference time, the requestsor reservations of new MSs may arrive randomly in the window. March 29, 2018 DRAFT hence the achievable rate of MS k in the t th time slot of the j th frame can be expressed as, R kj,t = W j,t log (cid:32) α kj (cid:107) h kj,t (cid:107) N W j,t p j,t (cid:33) , (1)where W j,t and p j,t are respectively the residual bandwidth and transmit power in the t th timeslot of the j th frame. In order to reﬂect the residual bandwidth after serving randomly arrivedRT services with random service time, we model W j,t as i.i.d. random variables in all timeslots of the j th frame [12]. h kj,t ∈ C N t × is the small scale Rayleigh fading channel vector withi.i.d. elements and E {(cid:107) h kj,t (cid:107)} = N t , α kj is the large scale channel gain in the j th frame, and N is the noise power spectrum density. For easy analysis, assume that the residual transmitpower is proportional to the residual bandwidth as in [16], i.e., p j,t = W j,t P max /W max . Then,the time-average achievable rate in the j th frame (called average rate for short) of MS k can beexpressed as, R kj = 1 T s T s (cid:88) t =1 R kj,t = 1 T s T s (cid:88) t =1 W j,t log (1 + α kj (cid:107) h kj,t (cid:107) σ P max ) , (2)where α kj (cid:107) h kj,t (cid:107) σ P max is instantaneous signal-to-noise ratio (SNR), and σ = N W max .III. R ESOURCE A LLOCATION P LANNING WITH P REDICTED I NFORMATION

In this section, we ﬁrst show the connection between the statistics of prediction errors of theaverage rate and those of the average channel gain and residual bandwidth. Then, we formulate aresource allocation planning problem to use the residual resources for serving randomly arrivedVoD MSs, and obtain the optimal solution. Finally, we extend the results to the network servingVoR MSs.

A. Statistics of Prediction Errors of Average Rate

The small scale channel gain h kj,t is hard to predict beyond the channel coherence time, and theinstantaneous residual bandwidth in each time slot W j,t is neither. As a result, the instantaneousdata rate R kj,t is hard to predict if not impossible. Fortunately, the trajectory of every NRT userand the trafﬁc load of RT service at every BS are predictable within the prediction window [5],[8], [9]. Then, the CP can predict the average channel gains in each frame for each MS with thehelp of a radio map [15], as well as the average residual bandwidth in each frame at each BS withthe predicted trafﬁc load [12]. In practice, the prediction is never perfect. Denote the predictedresidual bandwidth in the j th frame as (cid:99) W j , which is with mean value of (cid:99) W j and variance σ (cid:99) W j . March 29, 2018 DRAFT

Denote the predicted large scale channel gain for MS k in the j th frame as (cid:99) α kj , which is withmean value of (cid:99) α kj and bounded uncertainty of δ kj / (i.e., (cid:99) α kj − δ kj / ≤ (cid:99) α kj ≤ (cid:99) α kj + δ kj / ).By using the predicted residual bandwidth in each frame as the residual bandwidth in eachtime slot and using the predicted average channel gain, and considering that h kj,t is i.i.d., if T s → ∞ , then from (1) and (2) we can express the predicted time-average rate as, (cid:99) R kj = 1 T s T s (cid:88) t =1 (cid:99) W j log (cid:16) (cid:99) α kj (cid:107) h kj,t (cid:107) σ P max (cid:17) = (cid:99) W j E (cid:110) log (cid:16) (cid:99) α kj (cid:107) h kj,t (cid:107) σ P max (cid:17)(cid:111) , (3)where the average is taken over small scale channel.For a random variable X , the expectation of its function ϕ ( X ) can be approximated as [24] E { ϕ ( X ) } = E { ϕ ( µ x + X − µ x ) } ≈ E { ϕ ( µ x ) + ϕ (cid:48) ( µ x )( X − µ x ) } = ϕ ( µ x ) , (4)where µ x = E { X } , and the approximation is accurate when the variance of X is small. Withthis approximation and E {(cid:107) h kj,t (cid:107) } = N t , (3) can be approximated as, (cid:99) R kj ≈ (cid:99) W j log (cid:16) (cid:99) α kj N t σ P max (cid:17) , (5)which is accurate when N t is large.The prediction errors of (cid:99) W j and (cid:99) α kj depend on the prediction algorithms of trafﬁc load anduser trajectory as well as the interpolation algorithms to derive the ﬁne-grained average residualbandwidth and average channel gain from a coarse-grained prediction and radio map construction.There is no model available for the distribution of (cid:99) W j and (cid:99) α kj in the literature that are validatedby viable algorithms on real data trace. To gain some useful insight, we model the predictionsaccording to the principle of maximum entropy [25]. With given mean value and variance,Gaussian distribution is with maximum entropy, and with given upper and lower bounds, uniformdistribution is with maximum entropy [26]. Since the mean value and variance of the prediction oftrafﬁc load (and hence residual bandwidth) could be obtained [8], the predicted average residualbandwidth can be modelled as Gaussian distribution. Since user trajectory in a short horizon isbounded by road topology [9] and shadowing can be approximated as bounded, we model thepredicted average channel gain as uniform distribution. Then, the following proposition showshow the statistics of the prediction errors of average residual bandwidth and average channelgain translate to the statistics of the prediction errors of average data rate. Such a relation canprovide a design guidance for the required accuracy on predicting average residual bandwidthand average channel gain. March 29, 2018 DRAFT

Proposition 1:

If (i) T s → ∞ , (ii) (cid:99) W j ∼ N ( (cid:99) W j , σ (cid:99) W j ) , (iii) (cid:99) α kj ∼ U ( (cid:99) α kj − δ kj / , (cid:99) α kj + δ kj / ,(iv) the predicted average and instantaneous SNRs are large and δ kj (cid:28) (cid:99) α kj , then the average rateprediction error (cid:102) R kj = (cid:99) R kj − R kj follows Gaussian distribution, which has mean value (cid:102) R kj ≈ (cid:99) W j (cid:99) µ kj − W j (cid:32) log (cid:16) α kj σ P max (cid:17) + ψ ( N t )ln 2 (cid:33) , (6)and variance σ (cid:102) R j ≈ ( σ (cid:99) W j + (cid:99) W j )( (cid:99) σ kj + (cid:99) µ kj ) − (cid:99) W j (cid:99) µ kj , (7)where (cid:99) σ kj ≈ δ kj ln (cid:32)(cid:16) δ kj − (cid:99) α kj (cid:17) ln (cid:99) α kj + δ kj / (cid:99) α kj − δ kj / δ kj (cid:33) , (8a) (cid:99) µ kj ≈ δ kj ln 2 (cid:16)(cid:99) α kj ln (cid:16) (cid:99) α kj + δ kj / (cid:99) α kj − δ kj / (cid:17) + δ kj (cid:99) α kj − δ kj / P σ + δ kj ( ψ ( N t ) − (cid:17) , (8b) W j = E { W j,t } , ψ ( · ) is the Euler’s digamma function, ψ (cid:48) ( · ) is the derivative of ψ ( · ) . When (cid:99) W j or (cid:99) α kj is biased, the impact of the prediction bias of large scale channel gain is much smallerthan that of the residual bandwidth on the prediction bias of average rate. Proof:

See Appendix A.Later simulations show that the results in Proposition 1 still hold when (cid:99) α kj is Gaussian, T s isnot so large, the values of δ kj and (cid:99) α kj are comparable, and the SNRs are not high. B. Optimizing the Resource Allocation Plan for the VoD MSs

At the beginning of the prediction window, the CP can make a resource allocation plan forserving the NRT users with the predicted time-average rates. To achieve the goal of fully usingthe residual resources for supporting high throughput of NRT users, we optimize the plan (i.e.,the time resources allocated to the VoD MSs) denoted as [ s , . . . , s K ] , where s k = [ s k , . . . , s kT f ] T ,and s kj ∈ [0 , is the percentage of the time slots assigned to MS k in the j th frame.Denote the objective function as f ( s , . . . , s K ) . To maximize the arrival rate (i.e., throughput)of the NRT users that the network can support, one way is to directly maximize the amount ofdata transmitted during the prediction window (equivalent to maximize the sum rate over thewindow [15]) or to indirectly minimize the total transmission time, each with ensured QoS [13].Yet such objectives cannot exploit residual resources in the network with randomly arrived VoD March 29, 2018 DRAFT0 requests. To help understand how to ﬁnd a proper objective function to achieve our goal, weﬁrst analyze the behavior of the policies optimized toward these two objectives in a special case:there is only one VoD MS in the network, who requests only one segment (i.e., N = 1 ) atthe reference time. Then, the playback duration is T seg frames, and the QoS is to complete thetransmission for the B bits before the playback of the segment.In this special case, the problem that maximizes the overall amount of data transmitted overthe prediction window meanwhile ensures no stalling for the VoD MS can be simpliﬁed as, max s T f (cid:88) j =1 s j (cid:99) R j (9a) s.t. T seg (cid:88) j =1 s j (cid:99) R j ∆ = B, (9b) ≤ s j ≤ , j = 1 , . . . , T f , (9c)where f ( s ) = (cid:80) T f j =1 s j (cid:99) R j , and ∆ is a constant and hence is removed from the objective function.It is easy to ﬁnd that if the BS can transmit B bits to the MS during T seg frames, the optimalsolution is any vector s satisfying (cid:80) T seg j =1 s j (cid:99) R j ∆ = B and (9c), which is not unique. In this casewhere the residual resource in the BS is sufﬁcient to convey the B bits, the objective function isno use at all, because there are only B bits required to transmit in the window. Otherwise, if theconstraint in (9b) cannot be satisﬁed, the problem is infeasible. In this case where the residualresource is insufﬁcient for ensuring the QoS of the VoD MS, a simple technique is to use allresidual resources in T seg frames for transmission. This suggests that such a formulation is notappropriate to optimize predictive resource allocation for the network with residual resources.In the special case, another problem that minimizes the total transmission time in the window When N > , the optimal solution of this problem (i.e., the allocated resources to transmit all the N segments) is still notunique. This is because the QoS constraint becomes (cid:80) T seg j =1 s j (cid:99) R j ∆ > B for the ﬁrst N − segments, while the constraint in(9b) should be satisﬁed for the last segment of the video. March 29, 2018 DRAFT1 meanwhile ensures no stalling can be simpliﬁed as min s T f (cid:88) j =1 s j (10a) s.t. T seg (cid:88) j =1 s j (cid:99) R j ∆ = B, (10b) ≤ s j ≤ , j = 1 , . . . , T f , (10c)where f ( s ) = (cid:80) T f j =1 s j , and again ∆ is removed from the objective function.We can see that if both problems (9) and (10) are feasible, then the optimal solution of problem(10) is one of the solution of problem (9) that minimizes the total time for transmission. Problem(10) is a linear programming, which can be solved by the simplex problem. If the problem isfeasible, the solution can be expressed as, s ∗ j i =  max (cid:16) min (cid:16) B − (cid:80) i − m =0 (cid:100) R j m s ∗ j m ∆ (cid:99) R j i ∆ , (cid:17) , (cid:17) , ≤ j i ≤ T seg , i ≥ , j i > T seg (11)where (cid:99) R j , · · · , (cid:92) R j T seg are the descending ordered (cid:99) R , · · · , (cid:91) R T seg . It can be seen that the CP alwayssequentially selects the frames in the window with the largest achievable rates for transmission.Now, we come back to the general problem with multiple MSs each with multiple segments.In practice, a new request for VoD trafﬁc may arrive in the prediction window, but the arrivaltime is hard to know at the reference time. With the solution of problem (10), when the newVoD MS arrives, some VoD MSs whose requests already arrive at the reference time (e.g., oneor more MSs among MS · · · MS K − in Fig. 1) may not have received any bits due to stillnot experiencing the best channels. Then, the VoD MSs may compete for the remaining timeresources in the window, and the resources before the new MS arrives is wasted.Inspired by the observation from the analysis on the special case, we introduce an alternativeobjective function. To fully use the residual resources under the uncertainty on future arrivedrequests, the data of the arrived VoD MSs should be transmitted in the earlier frames that arecloser to the reference time. A natural way to employ more time slots in the early frames is todeﬁne the objective function for multiple MSs as f ( s , . . . , s K ) = (cid:80) T f j =1 (cid:80) Kk =1 ω ( j ) s kj , where theweighting function ω ( j ) should increase with j . To balance the usage of the early frames closeto the reference time and those with higher rate, we can simply set ω ( j ) = j as an illustration. March 29, 2018 DRAFT2

We can also select other weighting functions, which do not change the optimization problemand achieve similar performance.To control the QoS of the VoD MSs, we impose constraint on the maximal waiting time foreach MS to watch the total video, T mw ∆ , which is the sum of the initial delay and overall timeof stalling during playback. Then, the expected deadline of MS k for transmitting all required (cid:80) ni =1 B ik bits to play Seg nk is [ T mw − T w ,k + T k + ( n − T seg ]∆ , n = 1 , · · · , N k .For MS k , there are N k segments to be played, and the playback duration of each segmentis T seg ∆ . To exploit the resources in the network and guarantee the QoS of the K MSs whoserequests have arrived at the reference time, the resource planning problem is formulated as, P1 : min T mw , s ,..., s K T f (cid:88) j =1 K (cid:88) k =1 j · s kj (12a) s.t. T mw − T w ,k + T k +( n − T seg (cid:88) j =1 s kj (cid:99) R kj ∆ ≥ n (cid:88) i =1 B ik , n = 1 , . . . , N k − , (12b) T mw − T w ,k + T k +( N k − T seg (cid:88) j =1 s kj (cid:99) R kj ∆ = N k (cid:88) i =1 B ik , (12c) (cid:88) k ∈K j,i s kj ≤ , j = 1 , . . . , T f , i = 1 , . . . , N b , (12d) s kj ∈ [0 , , j = 1 , . . . , T f , k = 1 , . . . , K, (12e)where (12b) and (12c) are the QoS constraints, (12d) is the total resource constraint at the i thBS, and K j,i is the set of MSs in the coverage of the i th BS in the j th frame.Problem P1 has two kinds of variables, the ﬁrst is the maximal waiting time T mw , and thesecond is the resource planning vector s kj . When the value of T mw is ﬁxed, the problem reducesto a linear programming [27] as follows, P2 : min s ,..., s K f ( s , . . . , s K ) s.t. (12b) , (12c) , (12d) , (12e) , k = 1 , . . . , K, (13)since (12d) and (12e) become linear constraints of variables s kj . Then, problem P2 can be easilysolved if the problem is feasible.When T mw decreases, the feasible region of problem P2 reduces. The minimal value of T mw to make problem P2 feasible can be found by bisection searching, which is denoted as T ∗ mw . March 29, 2018 DRAFT3

Given this value of T ∗ mw , the optimal resources assigned to the K MSs can be obtained as s k ∗ = [ s k ∗ , . . . , s k ∗ T f ] H , which is the global optimal solution of problem P1 . Remark : At the reference time when MS , · · · , MS K − have sent their requests and MS K initiates its video request, the resource allocation plan is made for all the K MSs by solvingproblem P1 . The CP needs to re-make a plan in the following scenarios: (i) when a new MSinitiates a request. In this case, the CP re-makes the plan for all MSs (including the new MS) inthe network; (ii) when a prediction window ﬁnishes before all segments requested by existingMSs are downloaded. In this case, CP re-makes a plan for transmitting the residual segments. C. Optimizing the Resource Allocation Plan for VoR MSs

Similar problem can be formulated for VoR MSs. When K − VoR MSs have already madetheir reservation before the reference time and a VoR MS makes its reservation at the referencetime, the only difference between the VoR MSs and VoD MSs lies in that the initial delay iszero, i.e., T w ,k = 0 . Then, a simpliﬁed problem from problem P1 can be obtained. Again, are-plan can be made similar to the system with VoD MSs.IV. T RANSMISSION P OLICY A CCORDING TO R ESOURCE A LLOCATION P LAN

With the resource allocation plan s k ∗ = [ s k ∗ , . . . , s k ∗ T f ] H , which MS should be served by (andhence associated with) which BS along the MS’s trajectory can be determined. At the start ofeach time slot, small scale channel vector of each MS can be estimated at its associated BS.Since more than one MS may be associated with a BS, user scheduling is necessary at eachtime slot. To maximize the number of satisﬁed MSs (i.e., the NRT users whose video ﬁles arecompletely conveyed before their expected deadline), the BS schedules the MSs according totheir transmission progress , deﬁned as Λ( k, J ) = J (cid:88) j =1 s k ∗ j (cid:99) R kj ∆ , (14)which is the amount of data ought to be accumulatively conveyed at the end of the J th frame( J = 1 , · · · , T f ). It can be computed by the CP at the start of the prediction window after makingthe resource allocation plan.In the t th time slot of the J th frame, the set of MSs who are planned to be served by the i thBS but have not caught up the transmission progress can be expressed as ˜ K J,i (cid:44) { k ∈ K J,i | Λ( k, J ) − ∆ T s ( j − (cid:88) l =1 T s (cid:88) τ =1 R kl,τ + t − (cid:88) τ =1 R kj,τ ) > } . (15) March 29, 2018 DRAFT4

To exploit the residual resources, the i th BS selects the MS with maximal instantaneousachievable rate from this MS set, i.e., according to the following rule k ∗ = arg max k { R kj,t | s k ∗ j > and k ∈ ˜ K J,i } . (16)Then, the i th BS serves the k ∗ th MS with MRT using the instantaneous residual transmit powerand residual bandwidth W j,t and p j,t .Due to the prediction error on the time-average rate, it may happen that some MSs do notcatch up the transmission progress at the end of a frame. In this case, the BS transmits theremaining data to these MSs at the beginning of the next frame, no matter if other segmentsneed to be transmitted in the frame. After the remaining data have been conveyed, the BSsstart to transmit the segments according to the plan. Despite that such a strategy may cause a“mismatch” between actual transmission progress and the planned progress, the mismatch canbe controlled by the re-plan mechanism.V. S IMULATION AND N UMERICAL R ESULTS

In this section, we validate previous analysis via numerical results and demonstrate theperformance gain of predictive resource allocation by simulations.

A. Simulation Set-Up

Consider a cellular network with six BSs, each equipped with N t = 8 antennas, which arelocated along a straight line. The cell radius is D = 250 m. As shown in Fig. 2, the NRT usersmove along three roads of straight lines with minimum distance from the BSs as m, mand m, respectively. Each MS requests a video with size of B = 20 Mbytes and playbackduration of s. Each video consists of N = 10 segments, i.e., each segment with size of Mbytes is played out for T seg = 10 s. The prediction window contains T f = 300 frames. Eachframe is with duration of one second, and each time slot is with duration ∆ = 10 ms, i.e., eachframe contains T s = 100 time slots (which is far from inﬁnity as we assumed in analysis).The video requests of the MSs randomly arrive only between the st frame and th frame inthe prediction window (when they arrive uniformly within the 300 frames, the results are similar).To characterize the different resource usage status of the BSs in serving the RT trafﬁc in anunder-utilized network, we consider two types of BSs: busy BS with average residual bandwidthin each frame (say the j th frame) as W j = 1 MHz and idle BS with W j = 10 MHz, which are

March 29, 2018 DRAFT5 (cid:1840) (cid:1872) = , (cid:1842) (cid:1865)(cid:1853)(cid:1876) = ~ m/s NRT users Background traffic Fig. 2. System setup in simulation. alternately located along the line as idle, busy, idle, busy, idle, and busy BS. Considering that theprediction error of trafﬁc load is within % as reported in [5], the predicted average residualbandwidth changes among frames according to (cid:99) W j ∼ N ( (cid:99) W j , σ (cid:99) W j ) , where σ (cid:99) W j / (cid:99) W j = 0 . . Toreﬂect the prediction error of user trajectory, the predicted large scale fading gains vary amongframes according to (cid:99) α kj ∼ U ( (cid:99) α kj − δ kj / , (cid:99) α kj + δ kj / , where δ kj / (cid:99) α kj = 1 , which corresponds to thevariation range of path loss between cell center and cell edge. We consider unbiased predictionfor (cid:99) W j and (cid:99) α kj , i.e., (cid:99) W j = W j and (cid:99) α kj = α kj .The maximal transmit power of each BS is 40 W and cell-edge SNR is set as 5 dB, where theintercell interference is implicitly reﬂected. Since shadowing has little impact on the performance,we only consider path loss in average channel gain to reduce the time for simulation. The pathloss model is . . ( d ) , where d is the distance between the BS and MS in meter.The results are obtained from 100 Monte Carlo trails. In each trail, the trajectory, request arrivaland channel gain of each MS change randomly. In particular, for each MS, the moving speedis uniformly distributed in (10 , m/s, the moving direction is uniformly selected as -180 or+180 degree, and the location where the MS initiates a request is randomly selected from thethree roads. The requests of the MSs arrive from the st to the th frame according to Poissonprocess with given average arrival rate. Besides, the small-scale channel in each time slot changesindependently according to Rayleigh fading. This setup will be used in the following simulation,unless otherwise speciﬁed. B. Resource Allocation Schemes for Comparison and Evaluation Metrics

We consider several resource allocation schemes for comparison, which can be divided into twocategories of predictive and non-predictive schemes. With predictive schemes, the CP can makeresource allocation plan with the predicted time-average rate in (5), while with non-predictive

March 29, 2018 DRAFT6 schemes, the CP does not predict any information, as listed in the following.

Predictive schemes: • Proposed : The resource allocation plan is found from the solution of problem P2 , and thetransmission policy in section IV is used. • Max-Throughput : The resource allocation plan is made to maximize the time-averagesum rate over the prediction window under the constraints in (12b)-(12e) (the optimizationproblem degenerates into problem (9) when there is only one MS and the video is only withone segment), which has the same objective function as the method proposed in [15]. Sincethe optimal solution is not unique, we can use any solution found from the constraints. • Min-Time : This is the method proposed in [13], where the resource allocation plan ismade to minimize the total transmission time of all MSs in the prediction window (theoptimization problem degenerates into problem (10) when there is only one MS and thevideo is only with one segment).

Non-predictive schemes: • Non-predictive w/o QoS : Each BS serves all MSs with best effort. In each time slot, theBS only serves the MS with the highest instantaneous data rate. • Non-predictive w QoS : This is the scheme proposed in [28], where each BS serves theMS with the earliest deadline in each time slot. If several MSs have the same deadline,then the MS with most bits to transmit is served ﬁrst.We consider two performance metrics: the average stalling time of all MSs and the maximalrequest arrival rate of MSs when the maximal stalling time expected by 99.9% of the MSs aresatisﬁed. The ﬁrst metric measures the QoS of the VoD MSs. The second metric measures thetrafﬁc carrying ability of the network for supporting the MSs with given tolerance on QoS. Othermetrics such as stalling frequency are also used to evaluate the QoS in the sequel.

C. Simulation and Numerical Results1) Validating the analysis:

We ﬁrst validate the proposition.We consider a typical scenario where MS k is served by a busy BS at the j th frame, i.e., W j = 1 MHz, and σ W j is set as . MHz. To reﬂect the uncertainty of prediction, we set σ (cid:99) W j as . MHz and δ kj /α kj = 1 . The results for other settings are similar, and hence are not shown.Fig. 3(a) provides simulation and numerical results for the probability density function (PDF)of (cid:102) R kj when (cid:99) W j and (cid:99) α kj follow Gaussian and/or uniform distribution (the results have been March 29, 2018 DRAFT7 normalized to have zero mean and unit variance for easy comparison). The average SNR is setas 5 dB or 35 dB, which represents the SNR when the MS is located at the cell edge or is closestto the BS when the MS moves along a straight line across the cell. Fig. 3(b) shows the accuracyof the approximations used in (6) and (7) when (cid:99) W j follows Gaussian distribution and (cid:99) α kj followsuniform or Gaussian distribution. Fig 3(c) shows the impact of variance of prediction errors of (cid:99) W j and (cid:99) α kj on the prediction error of (cid:99) R kj when (cid:99) W j and (cid:99) α kj are unbiased predictions. To unity theunits, the prediction error statistic is measured by coefﬁcient of variation (CV, i.e., ζ = σ (cid:99) W j / (cid:99) W j ,taking residual bandwidth as an example). Fig. 3(d) shows the impact of the prediction bias of W j and α kj on the prediction bias of R kj , where the prediction bias is normalized by true value,i.e., ( (cid:99) W j − W j ) /W j , again using residual bandwidth as an example.It is shown from Fig. 3(a) that when (cid:99) W j follows Gaussian distribution, (cid:102) R kj is Gaussianas well, no matter what distribution (cid:99) α kj follows and under which SNR. However, when (cid:99) W j follows uniform distribution, (cid:102) R kj approximately follows uniform distribution. This suggests thatthe distribution of (cid:102) R kj mainly depends on that of (cid:99) W j . It is shown from Fig. 3(b) that if (cid:102) α kj followsGaussian or uniform distribution, the approximations used in Proposition 1 are very accuratewhen the average SNR is larger than 15 dB. This implies that the relation between the predictionerror statistics provided in the proposition are valid for predictive resource allocation, since itsbasic idea is to transmit at good channel condition [15]. Fig. 3(c) shows that the CV of (cid:99) W j haslarger impact on the CV of (cid:99) R kj compared to the CV of (cid:99) α kj . Fig. 3(d) shows that when (cid:99) W j iswith bias, the bias of (cid:99) R kj grows linearly with the bias of (cid:99) W j , while when (cid:99) α kj is with bias, theprediction bias of (cid:99) R kj grows logarithmically with the prediction bias of (cid:99) α kj . This indicates thatthe variance and bias of (cid:99) W j have larger impact on those of (cid:99) R kj , which validates the proposition.

2) Performance gain brought by prediction:

To demonstrate the gain from prediction, wecompare “Proposed” scheme with “Non-predictive w QoS” scheme in Fig. 4. Furthermore, bycomparing the performance of serving VoD and VoR MSs with each scheme, we can observethe gain from knowing the contents to be transmitted and the request arrival time in advancebefore the MSs initiate requests.By comparing “Proposed” scheme with “Non-predictive w QoS” scheme either when servingVoD or when serving VoR trafﬁc, we can see remarkable gain from the prediction of futurerate. By comparing the results obtained for VoR and VoD MSs with “Proposed” scheme or with“Non-predictive w QoS” scheme, we can observe the additional gain of knowing the contents tobe requested and the request arrival time, which is dramatic even with only 10 s reservation in

March 29, 2018 DRAFT8 −6 −4 −2 0 2 4 600.10.20.30.40.5 f R kj (after normalization) P D F Numerical Simulated: SNR=35dB( c W j : G, c α kj : U)Simulated: SNR=5dB( c W j : G, c α kj : G) Simulated: SNR=5dB( c W j :G, c α kj : U)Simulated:SNR=35dB( c W j : U, c α kj : G)Simulated:SNR=35dB( c W j : G, c α kj : G) (a) Numerical and simulated PDF of average rate. A cc u r a cy o f t he app r o x i m a t i on ( % ) σ e R kj ( c W j : G, c α kj : U) σ e R kj ( c W j : G, c α kj : G) f R kj ( c W j : G, c α kj : U) f R kj ( c W j : G, c α kj : G) (b) Normalized approximation errors of (cid:102) R kj and σ (cid:103) R kj versus SNR. ζ σ b R k j / c R k j σ b α kj = 0, σ b W j = ζ c W j , SNR=5dB σ b α kj = 0, σ b W j = ζ c W j , SNR=35dB σ b α kj = ζ c α kj , σ b W j = 0, SNR=5dB σ b α kj = ζ c α kj , σ b W j = 0, SNR=35dB (c) Impact of the CV of (cid:99) W j and (cid:99) α kj on the CV of (cid:99) R kj . −50 0 50−40−200204060 Predicted bias of W j or α kj (%) P r e d i c t e db i a s o f R k j ( % ) c W j : biased , c α kj :unbiased, SNR=35dB c W j : biased , c α kj :unbiased, SNR=5dB c W j : unbiased , c α kj :biased, SNR=5dB c W j : unbiased , c α kj :biased, SNR=35dB (d) Impact of the bias of (cid:99) W j and (cid:99) α kj on the bias of (cid:99) R kj .Fig. 3. Validating the proposition. In the legends, “G” and “U” stand for Gaussian and uniform distributions, respectively. advance. Moreover, the performance gap between these two schemes increases with the increaseof reservation time. This indicates that the gain from predicting future rate will be even largerif the content to be requested and the request arrival time can be predicted.

3) Impact of using CSI:

Most of existing works of predictive resource allocation do notconsider CSI both in optimization and in simulation, either by assuming that the small scalechannel gain is static in each frame or by stating that its variation over time slots can be averaged

March 29, 2018 DRAFT9 M a x i m a l r eque s t a rr i v a l r a t e (r eque s t s / s ) Proposed, VoR−20sProposed, VoR−10sNon−Predictive w QoS, VoR−20sProposed, VoDNon−Predictive w QoS, VoR−10sNon−Predictive w QoS, VoD

Fig. 4. Gain from predicting average rate. “VoR-20” or “VoR-10” in the legend means a VoR MS making reservation 20 s or10 s in advance before the MS starts to play the video. out in a frame. However, the small scale channels of mobile users are impossible static, which infact vary much faster than the large scale channels. On the other hand, despite that the variationof small scale channel gains among time slots in a frame can indeed be averaged out whenderiving the time-average rate of a frame if the gains are i.i.d., this does not mean that theycan be ignored during transmission. In practical cellular networks, CSI can be estimated at theBS by training at the start of each time slot. To help understand where the gain of our solutionover existing works (as shown in the sequel) comes from, we compare the proposed schemewith “Min-Time” scheme, both with or without using CSI, using the following way. When notusing CSI during transmission in each time slot, both schemes schedule users sequentially. Forexample, if MS and MS need to download videos in the j th frame from a BS and the solutionof problem P1 is s j = 0 . and s j = 0 . , then the BS will serve MS in the ﬁrst time slotsin the j th frame and serve MS in the remaining time slots. When using CSI, we use thetransmission policy in section IV for both schemes after resource allocation plan are made byboth schemes.Fig. 5 shows the average total stalling time versus average request arrival rate of the VoD MSs.We can observe the performance loss in QoS, especially when the average request arrival rateis high. Extensive simulations show that the schemes using CSI provide less stalling frequencythan those without CSI, which are not shown for conciseness. March 29, 2018 DRAFT0 request s/s) A v e r age t o t a l s t a lli ng t i m e ( s ) Min−Time (No CSI)Min−Time (CSI)Proposed (No CSI)Proposed (CSI)

Fig. 5. Impact of using CSI on the QoS of VoD MSs.

4) Comparison with other schemes:

In what follows, we compare the performance of theproposed scheme with other schemes. In all the predictive schemes, α kj and W j are predictedwith errors modelled in subsection IV.A. For a fair comparison, the transmission policy in sectionIV is used for all predictive schemes to exploit the CSI available at each each time slot. Toobserve the impact of prediction errors, “Proposed” scheme is also simulated when there are noprediction errors, i.e., σ (cid:99) W j = δ kj = 0 .In Fig. 6, we show the maximal average request arrival rate of the VoD MSs versus theexpected maximal stalling time of each MS, which reﬂect the capability of supporting highthroughput for VoD service by exploiting residual resources. It is shown that when the maximalstalling time is 10s, the gain of “Proposed” over “Non-predictive w/o QoS” is 230%, the gainover “Non-predictive w QoS” is 110%, the gain over “Min-Time” is 33%, and the gain over“Max-Throughput” is 29%. We can also see that the performance loss caused by predictionerrors is 10% when “Proposed” scheme is adopted.In Fig. 7, we show the average total stalling time of all the MSs versus the average requestarrival rate of the MSs, which can reﬂect the average QoS of the MSs for a given trafﬁc load.It is shown that when the average request arrival rate is 0.5 requests/s, the gain of “Proposed”over “Non-predictive w QoS” in terms of reducing the average total stalling time is 98%, thegain over “Non-predictive w/o QoS” is 84%, the gain over “Min-Time” is 76%, and the gainover “Max-Throughput” is 43%. We can also see that the performance loss caused by predictionerrors is 54% when “Proposed” scheme is used. March 29, 2018 DRAFT1 M a x i m a l r eque s t a rr i v a l r a t e (r eque s t s / s ) Proposed (no prediction error), VoDProposed, VoDMax−Throughput, VoDMin−Time, VoDNon−Predictive w QoS, VoDNon−predictive w/o QoS, VoD

Fig. 6. Performance comparison in terms of trafﬁc carrying ability of the network with given tolerance of QoS of the MSs.

Average request arrival rate (requests/s) A v e r age t o t a l s t a lli ng t i m e ( s ) Non−predictive w QoS, VoDNon−predictive w/o QoS, VoDMin−Time, VoDMax−Throughput, VoDProposed,VoDProposed(no prediction error), VoD reaches 9swhen averagerequest arrivalrate is 0.5requests/s

Fig. 7. Average QoS of the MSs with given trafﬁc load of the MSs.

In Fig. 8, we show the cumulative distribution function (CDF) of several key performanceindicators to characterize the QoS of the VoD MSs when the average request arrival rate is 0.5requests/s. As expected, “Proposed” scheme can provide the lowest stalling frequency, stallingtime, and maximal stalling time among all schemes.VI. C

ONCLUSIONS

In this paper, we investigated the potential of predictive resource allocation in supportinghigh request arrival rate of VoD service by exploiting network residual resources. To this end,we formulated a problem to optimize resource allocation plan with predicted time-average rate

March 29, 2018 DRAFT2 CD F CD F CD F Proposed Max−Throughput Min−Time Non−predictive w/o QoS Non−predictive w QoS

Fig. 8. CDF of QoS-related key indicators: (a) stalling frequency; (b) stalling time; (c) maximal stalling time of each MS,where the average request arrival rate of VoD MSs is 0.5 requests/s. for VoD MSs with asynchronously arrived random requests, and found the optimal solution. Inpractice, the predicted time-average rate can be obtained from the predicted average residualbandwidth at each BS and the predicted average channel gain of each VoD MS. To gain usefulinsight for the accuracy of each type of prediction, we showed the relation of the mean valuesand variances between their prediction errors. Analytical results showed that the average residualbandwidth should be predicted accurately in order to reduce the prediction error of average rate,while the average channel gain is unnecessary to predict with high accuracy. We developed atransmission policy according to the resource allocation plan where the instantaneous channelavailable at each time slot is used. Simulation and numerical results validated our analysis, anddemonstrated that the proposed predictive resource allocation can support much higher trafﬁc loadthan priori methods with given tolerance of QoS of the MSs. Besides, the gain from predictionwill be even more remarkable if the content to be requested and the request arrival time are ableto be known only several seconds in advance.A

PPENDIX AP ROOF OF P ROPOSITION k for notationalsimplicity. March 29, 2018 DRAFT3 i) We ﬁrst show that (cid:102) R j = (cid:99) R j − R j follows Gaussian distribution. Because R j = T s (cid:80) T s t =1 R j,t = T s (cid:80) T s t =1 W j,t log (1 + α j (cid:107) h j,t (cid:107) σ P max ) , and W j,t and h j,t are i.i.d. in all time slots within the j th frame, we have D { R j } = D { R j,t } /T s . When T s → ∞ , D { R j } = 0 , i.e., R j isdeterministic. Then, the distribution of (cid:102) R j depends on (cid:99) R j . Hence, we only need to prove that (cid:99) R j ≈ (cid:99) W j log (cid:16) (cid:99) α j N t σ P max (cid:17) (cid:44) (cid:99) W j (cid:98) γ j follows Gaussian distribution.If (cid:98) α j − δ j / ≤ (cid:98) α j ≤ (cid:98) α j + δ j / , the PDF of (cid:98) γ j (cid:44) log ( (cid:99) α j N t σ P max ) can be expressed as [29], f ( γ ) j ( γ ) =  f ( α ) j (cid:16) γ σ P max N t (cid:17) γ σ P max N t ln 2 , g − j < γ < g + j , , otherwise, (A.1)where f ( α ) j ( · ) is the PDF of (cid:98) α j , g − j = log (cid:16) (cid:98) α j − δj σ N t P max (cid:17) , g + j = log (cid:16) (cid:98) α j + δj σ N t P max (cid:17) , and (cid:98) α j = E { (cid:98) α j } .Then, the cumulative distribution function (CDF) of (cid:99) R j ≈ (cid:99) W j (cid:98) γ j can be obtained as, F ( R ) j ( r ) = Pr( R ≤ r ) ≈ (cid:90) ∞ (cid:32) (cid:90) r/w f ( γ ) j ( γ )d γ (cid:33) f ( W ) j ( w )d w = (cid:90) r (cid:14) g + j (cid:32) (cid:90) r/wg − j f ( γ ) j ( γ )d γ (cid:33) f ( W ) j ( w )d w + (cid:90) r (cid:14) g − j r (cid:14) g + j (cid:32) (cid:90) r/wg − j f ( γ ) j ( γ )d γ (cid:33) f ( W ) j ( w )d w, (A.2)where f ( W ) j ( · ) is the PDF of (cid:99) W j . Since w < r (cid:14) g + j in the ﬁrst term, i.e., r/w > g + j , according to(A.1) the inner integral in the ﬁrst term equals . Similarly, since w > r (cid:14) g + j in the second term,i.e., r/w < g + j , the inner integral in the second term is less than . Hence, F ( R ) j ( r ) satisﬁes (cid:90) r (cid:14) g + j f ( W ) j ( w )d w ≤ F ( R ) j ( r ) ≤ (cid:90) r (cid:14) g − j f ( W ) j ( w )d w. (A.3)When r (cid:14) g − j − r (cid:14) g + j → , the upper and lower bounds of F ( R ) j ( r ) meet. This suggests that if (cid:99) W j follows Gaussian distribution, then (cid:99) R j and hence (cid:102) R j also follow Gaussian distribution. From thedeﬁnition of g − j and g + j , the condition r (cid:14) g − j − r (cid:14) g + j → can be rewritten as r log (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) log (cid:16) (cid:98) α j + δ j / σ N t P max (cid:17) log (cid:16) (cid:98) α j − δ j / σ N t P max (cid:17) → , (A.4)which holds when δ j (cid:28) (cid:98) α j or (cid:98) α j σ N t P max is large.ii) We then derive the mean value of the prediction error (cid:102) R j . To this end, we derive themean value of (cid:99) R j (denoted as (cid:99) R j ) and the mean value of R j (denoted as R j ). To derive (cid:99) R j , March 29, 2018 DRAFT4 we ﬁrst derive the mean value of log (cid:16) (cid:99) α j (cid:107) h j,t (cid:107) σ P max (cid:17) , which is denoted as (cid:98) µ j . Since (cid:98) α j ∼U ( (cid:98) α j − δ j / , (cid:98) α j + δ j / and the small scale channel is Rayleigh fading, it can be derived as, (cid:98) µ j = (cid:90) ∞−∞ (cid:90) ∞−∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) f ( α ) j ( α ) f ( H ) ( (cid:107) h (cid:107) )d α d (cid:107) h (cid:107) = (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:26)(cid:90) ∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) f ( H ) ( (cid:107) h (cid:107) )d (cid:107) h (cid:107) (cid:27) δ j d α, (A.5)where f ( H ) ( (cid:107) h (cid:107) ) is the PDF of (cid:107) h j,t (cid:107) , which is f ( H ) ( (cid:107) h (cid:107) ) = ( (cid:107) h (cid:107) ) N t − e −(cid:107) h (cid:107) Γ( N t ) . (A.6)When the predicted instantaneous SNR (cid:29) log (cid:16) (cid:99) α j (cid:107) h j,t (cid:107) σ P max (cid:17) ≈ log (cid:16) (cid:99) α j (cid:107) h j,t (cid:107) σ P max (cid:17) .After substituting (A.6) and further considering the integral result, (cid:90) ∞ a ln( bx ) x N − e − cx d x y = cx = ac − N ln (cid:16) bc (cid:17) (cid:90) ∞ y N − e − y d y + ac N (cid:90) ∞ e − y y N − ln y d y ( a ) = ac − N Γ( N ) (cid:26) ln (cid:16) bc (cid:17) + ψ ( N ) (cid:27) , (A.7)where a > , b > , c > , Γ( · ) is the Euler gamma function and ψ ( · ) is the digamma function,(a) comes from (cid:82) ∞ y N − e − y d y = Γ( N ) , and (cid:82) ∞ e − y y N − ln y d y = Γ( N ) ψ ( N ) [29], (A.5) canbe derived as, (cid:98) µ j ≈ δ j ln 2 (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:32) ln (cid:16) (cid:98) α j σ P max (cid:17) + ψ ( N t ) (cid:33) d α = 1 δ j ln 2 (cid:32)(cid:98) α j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + δ j (cid:16) ( (cid:98) α j + δ j / (cid:98) α j − δ j / P σ (cid:17) + δ j (cid:0) ψ ( N t ) − (cid:1)(cid:33) . Since the residual bandwidth is independent from small scale channels of the NRT users, themean value of the predicted time-average rate can be obtained as, (cid:99) R j = E { (cid:99) W j } E (cid:110) T s T s (cid:88) t =1 log (1 + (cid:98) α j (cid:107) h j,t (cid:107) σ P max ) (cid:111) ≈ (cid:99) W j (cid:98) µ j . (A.8)Similarly, the mean value of the time-average rate can be derived as R j ≈ W j (cid:32) log (cid:16) α j σ P max (cid:17) + ψ ( N t )ln 2 (cid:33) , (A.9)where W j = E { W j,t } , and the approximation is accurate when the instantaneous SNR is large.Therefore, we obtain the mean value of the prediction error as in (6) with (cid:98) µ j as in (8b). March 29, 2018 DRAFT5 iii) Next, we derive the variance of the prediction error. Since R j is deterministic when T s → ∞ , we only need to derive D { (cid:99) R j } . We ﬁrst derive the the variance of T s (cid:80) T s t =1 log (1 + (cid:99) α j (cid:107) h j,t (cid:107) σ P max ) , which is denoted as (cid:98) σ j .Since the small scale channel gains are i.i.d. among the time slots in each frame, we have (cid:98) σ j = T s (cid:80) T s t =1 (cid:80) T s t =1 (cid:98) σ j,t t , where (cid:98) σ j,t t = cov (cid:16) log (1 + (cid:99) α j (cid:107) h j,t (cid:107) σ P max ) , log (1 + (cid:99) α j (cid:107) h j,t (cid:107) σ P max ) (cid:17) ,and cov stands for covariance. When the predicted instantaneous SNR (cid:29) (cid:98) α j ∼ U ( (cid:98) α j − δ j / , (cid:98) α j + δ j / , we have (cid:98) σ j,tt ≈ (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:110) (cid:90) ∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) f ( H ) ( (cid:107) h (cid:107) )d (cid:107) h (cid:107) (cid:111) f ( α ) ( α )d α − (cid:98) µ j = (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:110) (cid:90) ∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) ( (cid:107) h j,t (cid:107) ) N t − e −(cid:107) h j,t (cid:107) Γ( N t ) d (cid:107) h (cid:107) (cid:111) δ j d α − (cid:98) µ j . By using the following integral result similarly derived as in obtaining (A.7), (cid:90) ∞ a ln ( bx ) x N − e − cx d x = ac − N Γ( N ) (cid:32)(cid:16) ln (cid:0) bc (cid:1) + ψ ( N ) (cid:17) + ψ (cid:48) ( N ) (cid:33) , (A.10)where a > , b > , c > , we have (cid:98) σ j,tt ≈ δ j ln (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:32) ln (cid:16) (cid:98) α j σ P max (cid:17) + ψ ( N t ) (cid:33) + ψ (cid:48) ( N t )  δ j d α − (cid:98) µ j . (A.11)Using the integral of ln ( ax ) and ln( ax ) in [29], (A.11) can be further derived as, (cid:98) σ j,tt ≈ δ j ln (cid:40)(cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) ln (cid:16) P ( (cid:98) α j − δ j / (cid:98) α j + δ j / σ (cid:17) + (cid:0) ψ ( N t ) − (cid:1)(cid:32) δ j ln (cid:16) P ( (cid:98) α j − δ j / (cid:98) α j + δ j / σ (cid:17) + ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17)(cid:33) + δ j (cid:32) ln (cid:16) P max ( (cid:98) α j + δ j / σ (cid:17) + ln (cid:16) P max ( (cid:98) α j − δ j / σ (cid:17)(cid:33) + δ j (cid:0) ψ ( N t ) − ψ ( N t ) + ψ (cid:48) ( N t ) + 2 (cid:1)(cid:9) − (cid:98) µ j = 1 δ j ln (cid:32)(cid:16) δ j − (cid:98) α j (cid:17) ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + δ j (cid:0) ψ (cid:48) ( N t ) (cid:1)(cid:33) . (A.12)When t (cid:54) = t , we have (cid:98) σ j,t t = (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:110) (cid:90) ∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) f ( H ) ( (cid:107) h (cid:107) )d (cid:107) h (cid:107) (cid:90) ∞ log (cid:16) (cid:98) α j (cid:107) h j,t (cid:107) σ P max (cid:17) f ( H ) ( (cid:107) h (cid:107) )d (cid:107) h (cid:107) (cid:111) δ j d α − (cid:98) µ j . March 29, 2018 DRAFT6

When the predicted instantaneous SNR (cid:29) , upon substituting (A.6) and by applying (A.7), wecan obtain (cid:98) σ j,t t ≈ δ j ln (cid:90) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:16) ln (cid:16) (cid:98) α j σ P max (cid:17) + ψ ( N t ) (cid:17) d α − (cid:98) µ j = 1 δ j ln (cid:32)(cid:16) δ j − (cid:98) α j (cid:17) ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + δ j (cid:33) . (A.13)Since log (1 + (cid:99) α j (cid:107) h j,t (cid:107) σ P max ) and (cid:107) h j,t (cid:107) are i.i.d. in all time slots in the j th frame, (cid:98) σ j,tt staysconstant for any time slot t and (cid:98) σ j,t t stays constant for any t (cid:54) = t in the frame. Then, wehave (cid:98) σ j = 1 T s T s (cid:88) t =1 T s (cid:88) t =1 (cid:98) σ j,t t ≈ T s (cid:16) T s (cid:98) σ j,tt + ( T s − T s ) (cid:98) σ j,t t (cid:17) = 1 δ j ln (cid:32)(cid:16) δ j − (cid:98) α j (cid:17) ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + δ j (cid:33) + ψ (cid:48) ( N t ) T s ln T s →∞ = 1 δ j ln (cid:32)(cid:16) δ j − (cid:98) α j (cid:17) ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + δ j (cid:33) . (A.14)Then, the variance of (cid:102) R j can be obtained as: σ (cid:102) R j = D { (cid:102) R j } = E { (cid:99) R j } − E { (cid:99) R j } ≈ ( σ (cid:99) W j + (cid:99) W j )( (cid:98) σ j + (cid:98) µ j ) − (cid:99) R j . (A.15)iv) Finally, we analyze the impact of prediction biases of residual bandwidth and large scalechannel gain. It is easy to see that if (cid:99) W j and (cid:98) α j are unbiased, then (cid:99) R j will be unbiased. In whatfollows, we separately show the impact of the biases of (cid:99) W j and (cid:98) α j .1) (cid:99) W j is biased and (cid:98) α j is unbiased: (cid:99) W j = ηW j and (cid:98) α j = α j , where η > is a factor reﬂectinghow large the bias (cid:99) W j − W j is (when η = 1 , the prediction is unbiased). Then, when N t islarge, the bias of the predicted time-average rate can be derived from (A.8) and (A.9) as, (cid:102) R j ≈ W j ln 2 (cid:32) η (cid:16) (cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + 12 ln (cid:16) ( (cid:98) α j + δ j / (cid:98) α j − δ j / P σ (cid:17) − (cid:17) − ln (cid:16) α j σ P max (cid:17)(cid:33) . March 29, 2018 DRAFT7

When (cid:98) α j (cid:29) δ j , (cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) ≈ and ln (cid:16) ( (cid:98) α j + δ j / (cid:98) α j − δ j / P σ (cid:17) ≈ ln (cid:16) (cid:98) α j σ P max (cid:17) .Then, the bias of the predicted time-average rate can be approximately connected with thebias of the predicted residual bandwidth as, | (cid:102) R j | ≈ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) W j ln 2 (cid:32) η ln (cid:16) (cid:98) α j σ P max (cid:17) − ln (cid:16) α j σ P max (cid:17)(cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = W j log (cid:16) α j σ P max (cid:17) | η − | . (A.16)2) (cid:99) W j is unbiased and (cid:98) α j is biased: (cid:99) W j = W j and (cid:98) α j = ηα j , where η > is a factor reﬂectinghow large the bias (cid:98) α j − α j is. Again, when N t is large, the bias of the predicted time-averagerate can be derived from (A.8) and (A.9) as, (cid:102) R j = (cid:99) R j − R j ≈ W j ln 2 (cid:32) (cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) + 12 ln (cid:16) ( (cid:98) α j + δ j / (cid:98) α j − δ j / P σ (cid:17) − − ln (cid:16) α j σ P max (cid:17)(cid:33) . Again, using (cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) ≈ and ln (cid:16) ( (cid:98) α j + δ j / (cid:98) α j − δ j / P σ (cid:17) ≈ ln (cid:16) (cid:98) α j σ P max (cid:17) when (cid:98) α j (cid:29) δ j , the bias of the predicted time-average rate can be approximately connected withthe bias of the predicted large scale channel gain as (cid:102) R j ≈ W j ln 2 (cid:32) ln (cid:16) ηα j σ P max (cid:17) − ln (cid:16) α j σ P max (cid:17)(cid:33) = W j log ( η ) . (A.17)Since the approximation in (A.17) is accurate when (cid:98) α j σ P max (cid:29) , i.e., ηα j σ P max (cid:29) , η should satisfy η (cid:29) σ α j P max . It is not hard to see that (cid:12)(cid:12)(cid:12) η − ( η ) (cid:12)(cid:12)(cid:12) is a monotonically increasingfunction of η , hence the following inequality holds, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log (cid:16) α j σ P max (cid:17) ( η − ( η ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:29) (cid:12)(cid:12)(cid:12)(cid:12) − σ α j P max (cid:12)(cid:12)(cid:12)(cid:12) . (A.18)When (cid:98) α j σ P max (cid:29) , σ (cid:98) α j P max ≈ . Then, we can show the relationship between (A.16) and(A.17) as | (cid:102) R j | ≈ W j | log ( η ) | (cid:28) W j log (cid:16) α j σ P max (cid:17) | η − | , (A.19)which means that the impact of the prediction bias of large scale channel gain is much smallerthan that of residual bandwidth on the prediction bias of time-average rate. Since lim x →∞ x ln(1 + x ) = 1 and lim x →∞ ln(1 + x ) = 0 , when (cid:98) α j (cid:29) δ j , (cid:98) α j δ j (cid:29) , then (cid:98) α j δ j ln (cid:16) (cid:98) α j + δ j / (cid:98) α j − δ j / (cid:17) = (cid:16) (cid:98) α j δ j − (cid:17) ln (cid:16) (cid:98) α j /δ j − / (cid:17) + ln (cid:16) (cid:98) α j /δ j − / (cid:17) ≈ . March 29, 2018 DRAFT8 R EFERENCES [1] N. Bhushan, J. Li, D. Malladi, R. Gilmore, D. Brenner, A. Damnjanovic, R. Sukhavasi, C. Patel, and S. Geirhofer, “Networkdensiﬁcation: the dominant theme for wireless evolution into 5G,”

IEEE Commun. Mag. , vol. 52, no. 2, pp. 82–89, Feb.2014.[2] T. Bohn, et al.

Science , vol. 327, no. 5968,pp. 1018–1021, Feb. 2010.[4] J. Froehlich and J. Krumm, “Route prediction from trip observations,” Soc. Automotive Eng. World Congress, Tech. Rep.,2008.[5] M. Mardani and G. B. Giannakis, “Estimating trafﬁc and anomaly maps via network tomography,”

IEEE/ACM Trans.Netw. , vol. 24, no. 3, pp. 1533–1547, June 2016.[6] Y. Shi, M. Larson, and A. Hanjalic, “Collaborative ﬁltering beyond the user-item matrix: A survey of the state of the artand future challenges,”

ACM Comput. Surveys , vol. 47, no. 1, pp. 1–45, May 2014.[7] N. Bui, M. Cesana, S. A. Hosseini, Q. Liao, I. Malanchini, and J. Widmer, “A survey of anticipatory mobile networking:Context-based classiﬁcation, prediction methodologies, and optimization techniques,”

IEEE Commun. Surv. Tutorials ,vol. 19, no. 3, pp. 1790–1821, 2017.[8] L. Nie, D. Jiang, S. Yu, and H. Song, “Network trafﬁc prediction based on deep belief network in wireless mesh backbonenetworks,” in

IEEE WCNC , 2017.[9] A. Nadembega, A. Haﬁd, and T. Taleb, “A destination and mobility path prediction scheme for mobile networks,”

IEEETrans. Veh. Technol. , vol. 64, no. 6, pp. 2577–2590, June 2015.[10] M. Kasparick, R. Cavalcante, S. Valentin, S. Stanczak, and M. Yukawa, “Kernel-based adaptive online reconstruction ofcoverage maps with side information,”

IEEE Trans. Veh. Technol. , vol. 65, no. 7, pp. 5461–5473, July 2016.[11] J. Chen, U. Yatnalli, and D. Gesbert, “Learning radio maps for UAV-aided wireless networks: A segmented regressionapproach,” in

IEEE ICC , 2017.[12] C. Yao, C. Yang, and I. Chih-Lin, “Data-driven resource allocation with trafﬁc load prediction,”

Journal of Communications& Information Networks , vol. 2, no. 1, pp. 52–65, Feb. 2017.[13] H. Abou-zeid, H. S. Hassanein, and S. Valentin, “Energy-efﬁcient adaptive video transmission: Exploiting rate predictionsin wireless networks,”

IEEE Trans. Veh. Technol. , vol. 63, no. 5, pp. 2013–2026, June 2014.[14] Z. Lu and G. de Veciana, “Optimizing stored video delivery for mobile networks: The value of knowing the future,” in

IEEE INFOCOM , Apr. 2013.[15] H. Abou-zeid, H. Hassanein, and S. Valentin, “Optimal predictive resource allocation: Exploiting mobility patterns andradio maps,” in

IEEE GLOBECOM , 2013.[16] C. Yao, C. Yang, and Z. Xiong, “Energy-saving predictive resource allocation planning and allocation,”

IEEE Trans.Commun. , vol. 64, no. 12, pp. 5078–5095, Dec. 2016.[17] W.-S. Soh and H. S. Kim, “A predictive bandwidth reservation scheme using mobile positioning and road topologyinformation,”

IEEE/ACM Trans. Netw. , vol. 14, no. 5, pp. 1078–1091, Oct. 2006.[18] S. Choi and K. G. Shin, “Adaptive bandwidth reservation and admission control in QoS-sensitive cellular networks,”

IEEETrans. Parallel Distrib. Syst. , vol. 13, no. 9, pp. 882–897, Sep. 2002.[19] A. Nadembega, A. Haﬁd, and T. Taleb, “Mobility-prediction-aware bandwidth reservation scheme for mobile networks,”

IEEE Trans. Veh. Technol. , vol. 64, no. 6, pp. 2561–2576, June 2015.

March 29, 2018 DRAFT9 [20] N. Bui, F. Michelinakis, and J. Widmer, “A model for throughput prediction for mobile users,” in

Proc. of EuropeanWireless , 2014.[21] R. Atawia, H. Abou-zeid, H. S. Hassanein, and A. Noureldin, “Joint chance-constrained predictive resource allocation forenergy-efﬁcient video streaming,”

IEEE J. Sel. Areas Commun. , vol. 34, no. 5, pp. 1389–1404, May 2016.[22] B. Veeravalli, Z. Zeng, N. Gupta, and G. Jia, “Network-based caching algorithms for reservation-based multimedia systems,”in

IEEE GCC , 2006.[23] M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hobfeld, and P. Tran-Gia, “A survey on quality of experience of httpadaptive streaming,”

IEEE Commun. Surveys Tut. , vol. 17, no. 1, pp. 469–492, 2015.[24] A. Papanicolaou,

Taylor approximation and the delta method , 2009. [Online]. Available: http://web.stanford.edu/class/cme308/OldWebsite/notes/TaylorAppDeltaMethod.pdf[25] E. T. Jaynes, “Information theory and statistical mechanics,”

Physical Review , vol. 106, no. 4, pp. 620–630, 1957.[26] S. Y. Park and A. K. Bera, “Maximum entropy autoregressive conditional heteroskedasticity model,”

Journal ofEconometrics , vol. 150, no. 2, pp. 219–230, 2009.[27] A. Schrijver,

Theory of linear and integer programming . John Wiley & Sons, 1998.[28] D. Su and C. Yang, “User-centric downlink cooperative transmission with orthogonal beamforming based limited feedback,”

IEEE Transactions on Communications , vol. 63, no. 8, pp. 2996–3007, 2015.[29] E. Zeidler,

Oxford users’ guide to mathematics . Oxford University Press, 2004.. Oxford University Press, 2004.