[PDF] Duration-Squeezing-Aware Communication and Computing for Proactive VR

Abstract

Proactive tile-based virtual reality video streaming computes and delivers the predicted tiles to be requested before playback. All existing works overlook the important fact that computing and communication (CC) tasks for a segment may squeeze the time for the tasks for the next segment, which will cause less and less available time for the latter segments. In this paper, we jointly optimize the durations for CC tasks to maximize the completion rate of CC tasks under the task duration-squeezing-aware constraint. To ensure the latter segments remain enough time for the tasks, the CC tasks for a segment are not allowed to squeeze the time for computing and delivering the subsequent segment. We find the closed-form optimal solution, from which we find a minimum-resource-limited, an unconditional and a conditional resource-tradeoff regions, which are determined by the total time for proactive CC tasks and the playback duration of a segment. Owing to the duration-squeezing-prohibited constraints, the increase of the configured resources may not be always useful for improving the completion rate of CC tasks. Numerical results validate the impact of the duration-squeezing-prohibited constraints and illustrate the three regions.

Full PDF

DDuration-Squeezing-Aware Communication andComputing for Proactive VR

Xing Wei, Chenyang Yang, and Shengqian Han

School of Electronics and Information Engineering, Beihang University, Beijing 100191, ChinaEmail: { weixing, cyyang, sqhan } @buaa.edu.cn Abstract —Proactive tile-based virtual reality video streamingcomputes and delivers the predicted tiles to be requested beforeplayback. All existing works overlook the important fact thatcomputing and communication (CC) tasks for a segment maysqueeze the time for the tasks for the next segment, whichwill cause less and less available time for the latter segments.In this paper, we jointly optimize the durations for CC tasksto maximize the completion rate of CC tasks under the taskduration-squeezing-aware constraint. To ensure the latter seg-ments remain enough time for the tasks, the CC tasks for asegment are not allowed to squeeze the time for computingand delivering the subsequent segment. We ﬁnd the closed-form optimal solution, from which we ﬁnd a minimum-resource-limited, an unconditional and a conditional resource-tradeoffregions, which are determined by the total time for proactiveCC tasks and the playback duration of a segment. Owing tothe duration-squeezing-prohibited constraints, the increase of theconﬁgured resources may not be always useful for improvingthe completion rate of CC tasks. Numerical results validatethe impact of the duration-squeezing-prohibited constraints andillustrate the three regions.

Index Terms —Proactive VR video streaming, computing com-munication tradeoff, resource conﬁguration, duration-squeezing-aware constraint

I. I

NTRODUCTION

Virtual reality (VR) video requires 360 ◦ × ◦ panoramicview with ultra high resolution. Delivering such videos is cost-prohibitive for wireless networks. This inspires proactive tile-based streaming [1], [2], which divides a full panoramic viewsegment into small tiles in spatial domain, predicts the futureﬁeld of view (FoV) of a user, and then renders and transmitsthe tiles overlapped with the predicted FoVs.Proactive tile-based VR video streaming contains threetasks: prediction, communication, and computing. Given thepredictor and the prediction accuracy required for satisfyingthe quality of experience (QoE), the total time for renderingand transmitting a segment can be determined [3]–[5]. Withsuch a total time budget, it has been shown in the literaturethat the communication and computing (CC) resources can beﬂexibly traded off [6], [7]. For example, when the communica-tion bandwidth is insufﬁcient, one can assign more computingresource for rendering in order to provide longer time fordelivering.However, all existing works [3], [8]–[11] for proactive tile-based VR video streaming overlook an important fact: thecommunication and computing tasks for successive segmentsare coupled in timeline. Speciﬁcally, the communication taskfor multiple segments in a video forms a queue, and the com- puting task forms another queue. Transmitting and computinga segment may squeeze the time for tasks for the next segment,such that the QoE may degrade owing to the insufﬁcient timeleft for accomplishing the tasks for latter segments.In this paper, we investigate how to maximize the perfor-mance of proactive tile-based VR video streaming consideringthe coupled timeline for computing and delivering successivesegments. To this end, we jointly optimize the durationsfor these two tasks to maximize the completion rate of CCtasks under the duration-squeezing-aware constraint. Whenthe length of a VR video is long, to ensure that the lattersegments remain enough time for the tasks, the CC tasks fora segment are not allowed to squeeze the time for computingand delivering the subsequent segment. We obtain the globaloptimal solution via Karush-Kuhn-Tucker (KKT) conditions.As far as the authors know, this is the ﬁrst work that considersthe time squeeze of these two tasks in proactive VR streaming.From the closed-form solution of the optimal durations,we ﬁnd a minimum-resource-limited, an unconditional anda conditional resource-tradeoff regions. The boundary of thethree regions depends on the relative values of the totaltime budget for communication and computing as well asthe playback duration of a segment. In practice, these twodurations can be very different, with the range of 0.2 ∼ ∼ YSTEM M ODEL

Consider a proactive tile-based VR video streaming systemwith a mobile edge computing (MEC) server co-located witha base station (BS). Each VR video consists of L segmentsin temporal domain, and each segment consists of M tiles inspatial domain. The playback duration of each tile equals tothe playback duration of a segment, denoted by T seg [1], [2]. a r X i v : . [ c s . MM ] J a n ach user is equipped with a head-mounted display (HMD),which can measure the head movement data, send the data tothe MEC server, and pre-buffer segments. The MEC serverrenders a video segment before delivering to the HMD. Observation

Transmitting

VR video playback t l+ Rendering l+ Head movement trace Predicted tiles ll l l +1 com t l B l E l B  l E  cpt t seg T cc T cc T p  m  cpt t com t (a) Rendering and transmitting pipeline squeeze, ∆ p > , ∆ m> Transmitting

VR video playback t l+ Rendering l+ ll l l +1 com t l B l E l B  l E  cpt t seg T cc T cc T m  p  cpt t com t (b) Transmitting pipeline squeeze, ∆ p < , ∆ m > Fig. 1: Proactively streaming the l th and ( l +1)th segments.When a user requests a VR video, the MEC server ﬁrststreams the ﬁrst l − segments in a reactive or a passivemode [15]. When the MEC server collects the informationof the user (e.g., the head movement data) in an observationwindow, proactive streaming for the l th segment begins, thensubsequent segments are predicted, rendered, and transmittedone after another, as shown in Fig. 1a. Speciﬁcally, at the endof the observation window for the l th segment, i.e, B l , the tilesin the l th segment to be requested are ﬁrst predicted, then thepredicted tiles are rendered with duration t cpt , and ﬁnally therendered tiles are transmitted with duration t com , which shouldbe ﬁnished before the start time for playback for the segment,i.e., E l . Therefore, the total computing and transmission timefor the segment T cc = E l − B l .To train a predictor for the whole video, T cc for everysegment needs to be identical. A predictor can be moreaccurate with a smaller value of T cc . This is because the tiles tobe predicted are closer to and hence are more correlated withthe head movement sequence in the observation window [3].Given a predictor and required viewport prediction accuracy,the value of T cc can be determined [3]–[5]. A. Duration-Squeezing-Aware Constraint

With identical value of T cc for every segment, we canobserve that E l +1 − E l = B l +1 − B l . Without playbackstalling, E l +1 − E l = T seg holds and thus B l +1 − B l = T seg .If the rendering for the l th segment ﬁnishes after B l +1 , thenthe computing task will squeeze the time for rendering the ( l +1)th segment. Denote the squeezed computing time as ∆ p = t cpt − ( B l +1 − B l ) = t cpt − T seg . If the renderingfor the l th segment can be ﬁnished within T seg , then ∆ p ≤ ,and there is no squeeze in rendering, as shown in Fig. 1b.Similarly, the communication task may also squeeze thetime for delivering the ( l +1)th segment. Denote the squeezedcommunication time as ∆ m . When ∆ p > , ∆ m = t com − t cpt , as shown in Fig. 1a. When ∆ p < , ∆ m = t com − t cpt − ( − ∆ p ) , as shown in Fig. 1b. By summarizing thetwo cases, we obtain ∆ m = t com − t cpt − ( − ∆ p ) + , where ( x ) + (cid:44) max { x, } . When ∆ m ≤ , the transmission can beﬁnished on time and there is no squeeze in the pipeline.For the l th segment, which is the ﬁrst segment with proac-tive streaming, the transmission and rendering tasks shouldbe ﬁnished within T cc , i.e., t cpt + t com ≤ T cc . For the( l +1)th segment, the remaining duration for CC tasks is t cpt + t com ≤ T cc − ((∆ p ) + + (∆ m ) + ) . For the L th segment,the remaining duration for CC tasks is t cpt + t com ≤ T cc − ( L − l ) (cid:8) (∆ p ) + + (∆ m ) + (cid:9) . (1) B. Computing and Transmission Model

The computing resource of MEC for rendering a VR videocan be assigned by allocating graphics processing unit (GPU)and compute uniﬁed device architecture cores [3], [16]. Togain useful insight, we assume that the computing resource,denoted as C total (in ﬂoating-point operations per second,FLOPS), is equally allocated among K users. Then, thenumber of bits that can be rendered per second, referred toas the computing rate , for the k th user, is C cpt ,k (cid:44) C total K · µ r ( in bit/s ) , where µ r is the required ﬂoating-point operations (FLOPs) forrendering one bit of FoV in FLOP/bit [3].The BS serves K single-antenna users using zero-forcingbeamforming over bandwidth B with N t antennas. The in-stantaneous data rate at the i th time slot for the k th user is C i com ,k = B log (cid:32) p k d − αk | ˜ h ik | σ (cid:33) , where ˜ h ik (cid:44) ( h ik ) H w ik is the equivalent channel gain, p k and w ik are respectively the transmit power and beamformingvector for the k th user, d k and h ik ∈ C N t are respectively thedistance and the small scale channel vector from the BS to the k th user, α is the path-loss exponent, σ is the noise power,and ( · ) H denotes conjugate transpose.We consider indoor users as in the literature, where thedistances of users, d k , usually change slightly [2], [17], [18]and hence are assumed ﬁxed. Due to the head movementand the variation of the environment, small-scale channelsare time-varying, which are assumed as remaining constant ineach time slot with duration ∆ T and changing independentlywith identical distribution among time slots. With the proactivetransmission, the predicted FoVs in a segment should betransmitted with duration t com . The number of bits transmittedith t com can be expressed as C com ,k t com , where C com ,k (cid:44) N s (cid:80) N s i =1 C i com ,k ∆ T is the time average transmission rate, and N s is the number of time slots in t com . Since future channelsare unknown when optimizing the durations, we use ensemble-average rate E h { C i com ,k } to approximate the time-average rate C com ,k , which is very accurate when N s or N t /K is large [3].To ensure fairness among users in terms of QoE, the transmitpower is used to compensate for the path loss, i.e., p k = βd − αk ,where β can be obtained from β ( (cid:80) Kk =1 1 d − αk ) = P and P isthe maximal transmit power of the BS. Then, the ensemble-average transmission rate for each user is equal.Without loss of generality, we consider an arbitrary user foranalysis in the sequel. For notational simplicity, we use C com to represent E h { C i com ,k } and use C cpt to represent C cpt ,k .III. D URATION O PTIMIZATION FOR C OMPUTING AND C OMMUNICATION

To reﬂect the system performance for rendering and deliver-ing all the predicted FoVs in a segment, deﬁne the completionrate of communication and computing (CC) tasks as S cc (cid:44) min (cid:26) C com t com S com , C cpt t cpt S cpt (cid:27) , (2)where S com = s fov · r f · T seg /γ c and S cpt = s fov · r f · T seg arerespectively the number of bits of all the predicted FoVs ina segment for transmission [19] and for rendering, γ c is thevideo compression ratio, r f (in frames per second) is framerate, s fov (cid:44) γ fov R w R h b is the number of bits in a FoV, γ fov is the ratio of FoV in a frame, R w and R h are respectivelythe pixels in wide and high of a frame, and b is the numberof bits per pixel relevant to color depth [19]. By substituting S com and S cpt into (2), we obtain S cc = min { ˜ C com t com , C cpt t cpt } s fov · r f · T seg , (3)where ˜ C com (cid:44) C com γ c is the equivalent transmission rate.If S cc > S cc = 0 , the HMD cannot receive anyrendered FoV on time, which will cause playout stalls.The durations for computing and delivering are optimizedto maximize the completion rate of CC tasks, i.e., P0 : max t cpt ,t com S cc (4a) s.t. ∆ p = t cpt − T seg , (4b) ∆ m = t com − t cpt − ( − ∆ p ) + , (4c) t cpt + t com ≤ T cc − ( L − l ) (cid:8) (∆ p ) + + (∆ m ) + (cid:9) . (4d)Problem P0 contains four cases, depends on whether or not ∆ p and ∆ m exceed zero. When the length of a VR video (i.e., L ) is long, to ensure that every latter segment has time to berendered and delivered, i.e., the right-hand side of (4d) is largerthan zero, the values of ∆ p and ∆ m should be non-positive. That is to say, squeezing either transmission or renderingtime of the subsequent segment is strictly prohibited. When ∆ p ≤ , we obtain t cpt ≤ T seg from (4b). When ∆ m ≤ ,by substituting (4b) into (4c), we obtain t com ≤ T seg . Then,problem P0 degenerates into P1 : max t cpt ,t com S cc (5a) s.t. t cpt + t com ≤ T cc , (5b) t cpt ≤ T seg , (5c) t com ≤ T seg . (5d)Problem P1 can be transformed into a convex problem. Fromthe KKT conditions, its optimal solution and the maximalvalue of the objective function of P1 can be obtained as t ∗ cpt  ∈ (cid:104) ˜ C com T seg C cpt , T min (cid:105) , ˜ C com < C cpt and T maxc > T seg , = T seg , ˜ C com ≥ C cpt and T maxc > T seg , = ˜ C com T cc ˜ C com + C cpt , T maxc ≤ T seg , (6a) t ∗ com  = T seg , ˜ C com ≤ C cpt and T maxc > T seg , ∈ (cid:104) C cpt T seg ˜ C com , T min (cid:105) , ˜ C com > C cpt and T maxc > T seg , = C cpt T cc ˜ C com + C cpt , T maxc ≤ T seg , (6b) S ∗ cc =  min { ˜ C com ,C cpt } s fov · r f , T maxc > T seg , ˜ C com C cpt T cc s fov · r f · T seg ( ˜ C com + C cpt ) , T maxc ≤ T seg , (6c)where T min (cid:44) min { T cc − T seg , T seg } and T maxc (cid:44) max { ˜ C com , C cpt } T cc ˜ C com + C cpt = max { t o cpt , t o com } . (7) t o cpt and t o com are the optimal durations for computing andcommunication without the constraints in (5c) and (5d) asconsidered in [3].IV. M INIMUM -R ESOURCE -L IMITED , U

NCONDITIONALAND C ONDITIONAL R ESOURCE -T RADEOFF R EGIONS

In this section, we show that the system may operatein a minimum-resource-limited, an unconditional resource-tradeoff, or a conditional resource-tradeoff regions.First we discuss the two cases in (6c).

Case 1 T maxc > T seg : If ˜ C com > C cpt , then T maxc = t o cpt from (7). Since the allowed maximal duration for renderingis T seg as shown in (5c), T maxc > T seg indicates that t o cpt exceeds the allowed rendering duration. This suggests that thecompletion rate of CC tasks is limited by the computing rate,where increasing the other type of resource ˜ C com is useless forimproving the system performance. Similarly, if ˜ C com < C cpt ,then T maxc = t o com and the system performance is limited bythe transmission rate. We refer to this case as “Minimum-resource-limited case”, where the efﬁcient resource conﬁgura-tion should satisfy ˜ C com = C cpt .We refer to a resource conﬁguration as “ efﬁcient ” when thedecrease of arbitrary one type of resources in the conﬁgurationwill reduce the value of S ∗ cc . ase 2 T maxc ≤ T seg : Both t o cpt and t o com satisfy theduration-squeezing-prohibited constraints in (5c) and (5d).In this case, either increasing the computing rate or thetransmission rate can improve the completion rate of CCtasks. This indicates a tradeoff between the computing rateand transmission rate [3]. We refer to this case as “Resource-tradeoff case”, where the resource conﬁguration is ﬂexible.However, the boundary of the two cases depends on T maxc ,which further depends on ˜ C com and C cpt as shown in (7).To provide useful insight into the resource conﬁguration, weprovide three regions in the following, which are independentof the conﬁgured resources.According to (7), we have T cc > T maxc ≥ T cc . (8) Minimum-resource-limited region : If T cc > T seg , then with T maxc ≥ T cc we have T maxc > T seg , i.e., Case 1 holds.

Unconditional resource-tradeoff region : If T cc ≤ T seg , thenwith T maxc < T cc we have T maxc < T seg , which is the sufﬁcientcondition to make Case 2 satisﬁed.

Conditional resource-tradeoff region : If T cc ∈ ( T seg , T seg ] ,considering that max { C com ,C cpt } C com + C cpt ∈ [ , , we obtain T maxc ∈ ( T seg , T seg ) . The system may operate in Case 1 or Case 2 . If T maxc ≤ T seg , then the system lies in Case 2 . If T maxc > T seg ,then the system lies in Case 1 , where the efﬁcient resourceconﬁguration is C com = C cpt and we have T maxc = T cc from(7). Further considering one boundary of the region T cc ≤ T seg , we obtain T maxc ≤ T seg , which is the condition of Case2 and can also be re-written as the condition for the efﬁcientresource conﬁguration as max { C com ,C cpt } C com + C cpt ≤ T seg T cc . That is tosay, in this region even if in Case 1 , the efﬁcient resourceconﬁguration can transform the system into

Case 2 , i.e., theresource-tradeoff case.V. N

UMERICAL R ESULTS

In this section, we validate the obtained analytical resultsand evaluate the performance of the optimized durations.We consider the VR video with 4K resolution (3840 × b = 12 bits per pixel [19]. The ratio of aFoV to a frame is γ fov = 0 . [18], then the number of bitsin a FoV is s fov = 3840 × × b × γ fov = 19 . Mbits.The frame rate of VR video is r f = 30 frames per second[21]. The compression ratio is γ c = 2 . [22]. The playbackduration of a segment is T seg = 1 s [21]. Depending on theconﬁgured communication and computing resources as well asthe number of users, the computing and transmission rates fora user can be very different. For example, when K = 4 , N t =8 , P = 24 dBm, B = 40 MHz, and d k = 5 m, the ensemble-average transmission rate for a user is C com = 0 . Gbps [3],and the equivalent transmission rate ˜ C com = C com γ c = 1 . Gbps. When Nvidia P40 GPU is used for rendering VR videosfor four users, the computing rate for a user is C cpt = 1 . Gbps [3]. To reﬂect the variation of conﬁgured resources, weset ˜ C com , C cpt ∈ [0 , Gbps, unless otherwise speciﬁed.In Fig. 2, we illustrate the three regions. As shown inFig. 2a, if C com (cid:54) = C cpt , then the system performance is (a) T cc > T seg ( T cc = 2 . s) (b) T cc < T seg ( T cc = 0 . s)(c) T cc ∈ ( T seg , T seg )( T cc = 1 . s) Fig. 2: (a) Minimum-resource-limited region, (b) Uncon-ditional resource-tradeoff region, (c) Conditional resource-tradeoff region.restricted either by communication or computing resource. Bycontrast, in the unconditional resource-tradeoff region shownin Fig. 2b, the communication and computing resources can beﬂexibly adjusted. In the conditional resource-tradeoff regionin Fig. 2c, the system conﬁgured with different resourceslies in communication-limited case, resource-tradeoff case, orcomputing-limited case. The boundary of the three cases is max { C com ,C cpt } C com + C cpt = T seg T cc . We can observe that if the system isresource-limited, say P in the ﬁgure, no matter if we increasethe computing rate or reduce the transmission rate in order tosatisfy the condition for efﬁcient resource conﬁguration (i.e., max { C com ,C cpt } C com + C cpt ≤ T seg T cc ), the system will ﬁnally fall into theresource-tradeoff case.In Fig. 3, we verify the necessity of imposing the duration-squeezing-prohibited constraints by taking the value of S cc over the ﬁrst four proactively streamed segments as an example(the results for other values of ˜ C com and C cpt are similarwhenever the difference between the two values are morethan 500). We compare the optimal durations in (6) with twobaseline schemes without considering the duration-squeezing-prohibited (SP) constraints. One is the optimal solution ofproblem P1 without the SP constraints in (5c) and (5d), where t com = t o com and t cpt = t o cpt , with legend “opt duration w/oSP”. The other scheme ﬁxes the durations as t com = T cc , withlegend “1:1 duration”. As expected, the optimal durations yieldthe best performance from the ( l +1)th segment.When T cc < T seg as shown in Fig. 3a, the optimal durationsachieve the same performance as the baseline “opt durationw/o SP”, because T cc ≤ T seg is the sufﬁcient condition of Case l+ l+ l+ (a) T cc < T seg ( T cc = 0 . s) l l+ l+ l+ (b) T cc ∈ ( T seg , T seg )( T cc = 1 . s)(c) T cc ∈ ( T seg , T seg )( T cc = 1 . s) l l+ l+ l+ (d) T cc > T seg ( T cc = 2 . s) Fig. 3: S cc and MTP latency v.s. segment index, ˜ C com = 900 Mbps and C cpt = 400 Mbps. . When Case 2 holds, t o com , t o cpt ≤ T seg , i.e., the transmittingand computing with “opt duration w/o SP” will not causethe squeeze. These two schemes outperform the scheme “1:1duration”, which shows the gain of matching the imbalancedcomputing rate and transmission rate.When T seg < T cc < T seg as shown in Fig. 3b, al-though “opt duration w/o SP” slightly outperforms the op-timal durations for the l th segment, the completion rate ofthe CC tasks of this baseline degrades to zero and stallinghappens for the ( l +1)th segment. This is because T maxc = max { ˜ C com ,C cpt } T cc ˜ C com + C cpt = 1 . > T seg , i.e., Case 1 holds, whereeither the transmitting or the computing of this baselinefor the l th segment squeezes the duration for the ( l +1)thsegment that causes the playback stalling, as visualized inFig. 3c. For the three schemes, the motion-to-photon (MTP)latency of ( l + n )th segment can be expressed as T MTP =[ t com + t cpt − ( n −

1) ((∆ p ) + + (∆ m ) + )] + .When T cc > T seg as shown in Fig. 3d, the squeeze isunavoidable for two baselines. This shows the necessity ofimposing the duration-squeezing-prohibited constraints.VI. C ONCLUSION

In this paper, we investigated maximizing the completionrate of CC tasks with task duration-squeezing-aware constraintin proactive VR streaming. From the obtained closed-formsolution, we found the minimum-resource-limited, uncondi-tional, and conditional resource-tradeoff regions. The bound-ary of the three regions depends on the relation between thetotal time budget for proactive communication and computingand the playback duration of a segment. In the minimum-resource-limited region, communication and computing re-sources can not be traded off. In the unconditional resource-tradeoff region, the resources can be ﬂexibly conﬁgured while in the conditional resource-tradeoff region, the efﬁcient conﬁg-uration should satisfy a condition. Numerical results validatedthe necessity of imposing duration-squeezing-prohibited con-straints and illustrated these regions.R

EFERENCES[1] F. Qian, L. Ji, B. Han, and V. Gopalakrishnan, “Optimizing 360 videodelivery over cellular networks,”

ACM SIGCOMM Workshop , 2015.[2] C.-L. Fan, W.-C. Lo, Y.-T. Pai, and C.-H. Hsu, “A survey on 360 ◦ videostreaming: Acquisition, transmission, and display,” ACM Comput. Surv. ,vol. 52, no. 4, Aug. 2019.[3] X. Wei, C. Yang, and S. Han, “Prediction, communication, and com-puting duration optimization for VR video streaming,”

IEEE Trans.Commun., early access , 2020.[4] C. Li, W. Zhang, Y. Liu, and Y. Wang, “Very long term ﬁeld of viewprediction for 360-degree video streaming,”

IEEE MIPR , 2019.[5] C. Fan, S. Yen, C. Huang, and C. Hsu, “Optimizing ﬁxation predictionusing recurrent neural networks for 360 ◦ video streaming in head-mounted virtual reality,” IEEE Trans. Multimedia , vol. 22, no. 3, pp.744–759, March 2020.[6] S. Mangiante, G. Klas, A. Navon, Z. GuanHua, J. Ran, and M. D. Silva,“VR is on the edge: How to deliver 360 ◦ videos in mobile networks,” ACM SIGCOMM , 2017.[7] S. Gupta, J. Chakareski, and P. Popovski, “Millimeter wave meets edgecomputing for mobile VR with high-ﬁdelity 8K scalable 360 ◦ video,” IEEE MMSP , 2019.[8] F. Guo, F. R. Yu, H. Zhang, H. Ji, V. C. M. Leung, and X. Li, “Anadaptive wireless virtual reality framework in future wireless networks:A distributed learning approach,”

IEEE Trans. Veh. Technol. , vol. 69,no. 8, pp. 8514–8528, 2020.[9] J. Du, F. R. Yu, G. Lu, J. Wang, J. Jiang, and X. Chu, “MEC-assistedimmersive VR video streaming over Terahertz wireless networks: A deepreinforcement learning approach,”

IEEE Internet Things J. , vol. 7, no. 10,pp. 9517–9529, 2020.[10] C. Zheng, S. Liu, Y. Huang, and L. Yang, “MEC-enabled wirelessVR video service: A learning-based mixed strategy for energy-latencytradeoff,”

IEEE WCNC , 2020.[11] J. Chakareski and S. Gupta, “Multi-connectivity and edge computing forultra-low-latency lifelike virtual reality,”

IEEE ICME , 2020.[12] X. Hou, S. Dey, J. Zhang, and M. Budagavi, “Predictive adaptivestreaming to enable mobile 360-degree and VR experiences,”

IEEETrans. Multimedia, early access , 2020.[13] W. Xing and C. Yang, “Tile-based proactive virtual reality streaming viaonline hierarchial learning,”

APCC , 2019.[14] W. Lo, C. Huang, and C. Hsu, “Edge-assisted rendering of 360° videosstreamed to head-mounted virtual reality,”

IEEE ISM , 2018.[15] 3GPP, “Extended reality (XR) in 5G,” 2020, 3GPP TR 26.928 version16.0.0 release 16.[16] NVIDIA, “NVIDIA CloudXR cuts the cord for VR, raises the bar forAR,” https://blogs.nvidia.com/blog/2020/05/14/cloudxr-sdk.[17] C. Perfecto, M. S. Elbamby, J. Del Ser, and M. Bennis, “Tamingthe latency in multi-user VR 360°: A QoE-aware deep learning-aidedmulticast framework,”

IEEE Trans. Commun. , vol. 68, no. 4, pp. 2491–2508, 2020.[18] W.-C. Lo, C.-L. Fan, J. Lee, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu,“360 ◦ video viewing dataset in head-mounted virtual reality,” ACMMMSys

IEEE J. Sel. Topics Signal Process. , vol. 14, no. 1,pp. 161–176, 2020.[21] A. Mahzari, A. T. Nasrabadi, A. Samiei, and R. Prakash, “FoV-awareedge caching for adaptive 360° video streaming,”

ACM MM , 2018.[22] M. Zhou, W. Gao, M. Jiang, and H. Yu, “HEVC lossless coding andimprovements,”