[PDF] Learning from Experience: A Dynamic Closed-Loop QoE Optimization for Video Adaptation and Delivery

Abstract

The quality of experience (QoE) is known to be subjective and context-dependent. Identifying and calculating the factors that affect QoE is indeed a difficult task. Recently, a lot of effort has been devoted to estimate the users QoE in order to improve video delivery. In the literature, most of the QoE-driven optimization schemes that realize trade-offs among different quality metrics have been addressed under the assumption of homogenous populations. Nevertheless, people perceptions on a given video quality may not be the same, which makes the QoE optimization harder. This paper aims at taking a step further in order to address this limitation and meet users profiles. To do so, we propose a closed-loop control framework based on the users(subjective) feedbacks to learn the QoE function and optimize it at the same time. Our simulation results show that our system converges to a steady state, where the resulting QoE function noticeably improves the users feedbacks.

Full PDF

LLearning from Experience: A DynamicClosed-Loop QoE Optimization for VideoAdaptation and Delivery

Imen Triki (cid:63) , Quanyan Zhu ◦ , Rachid El-Azouzi (cid:63) , Majed Haddad (cid:63) and Zhiheng Xu ◦ (cid:63) CERI/LIA, University of Avignon, Avignon, France. ◦ NYU Tandon School of Engineering, New York, USA.

Abstract —The quality of experience (QoE) is known to besubjective and context-dependent. Identifying and calculating thefactors that affect QoE is indeed a difﬁcult task. Recently, a lotof effort has been devoted to estimate the users’ QoE in order toimprove video delivery. In the literature, most of the QoE-drivenoptimization schemes that realize trade-offs among differentquality metrics have been addressed under the assumption ofhomogenous populations. Nevertheless, people perceptions on agiven video quality may not be the same, which makes the QoEoptimization harder. This paper aims at taking a step further inorder to address this limitation and meet users’ proﬁles. To do so,we propose a closed-loop control framework based on the users’(subjective) feedbacks to learn the QoE function and optimize itat the same time. Our simulation results show that our systemconverges to a steady state, where the resulting QoE functionnoticeably improves the users’ feedbacks.

Index Terms —QoE, learning, neural network, average videoquality, startup delay, video quality switching, video stalls,rebuffering delay.

I. I

NTRODUCTION

Due to the emergence of smartphones in human daily lifeand the tremendous advances in broadband access technolo-gies, video streaming services have greatly evolved over thelast years to become one of the most provided services in theInternet. According to Cisco [1], http video streaming will be of all internet consumers’ trafﬁc by , up from in2015. It is then well understood that more and more interestis being today accorded to video streaming services in thehope of making all people satisﬁed with the video deliveredquality. Nevertheless, satisfying all users at once is a hard task;in fact, someone may appreciate a video quality that someoneelse may not appreciate at all. This makes the study of theQoE too complex.In the literature, different studies have been explored toexpress the user’s QoE as an explicit function of some metrics.Some works claim that the QoE can be directly mapped tosome QoS metrics such as the throughput, the jitter and thepacket loss [2], [3]. Other recent works found that the QoEcould be expressed through some application metrics such asthe frequency of video freezing (stalls), the startup delay, theaverage video quality and the dynamic of the quality changingduring the streaming session [4], [5]. However, the QoEmay change depending on the video context and some other external factors linked to the user himself [6], which explainsthe trend of using new standardized subjective metrics such asthe MOS (Mean Opinion Score) and the users’ engagementrate [7]–[9]. A direct relation between the time spent inrebuffering and the user’s engagement has been studied in [9].In the industry, various adaptive video streaming solutionshave been explored to meet the users’ expectations, suchas Microsoft’s smooth streaming, Adobe’s HTTP dynamicstreaming and Apple’s live streaming. All these solutions usethe well-known DASH (Dynamic Adaptive Streaming overHTTP) standard. Despite the emergence of several proposalsto improve the QoE, there is still no consensus across thesesolutions since the users’ perceptions are quite different.The main motivation of this work is to make DASH dealwith the very wide heterogeneity of people. At the core, liesthe idea of performing real time supervision on the users’ realperceptions to permanently improve the performance of qualityadaptation. To the best of our knowledge, such a paradigmhas not been yet investigated for adaptive video streaming. Inthe literature, we found that the users’ feedbacks on the videoquality delivery were only used to study the human perceptionsor to validate some analytical QoE models to help predict theQoE [4], [6], [10], [11].In [11], QoE prediction was performed by incorporatingmachine learning, users’ feedbacks and some of the QoE-related features such as rebuffering and memory-effect. Fol-lowing the same trend, we combine machine learning with aQoE-maximization problem in a closed-loop architecture todynamically adapt video quality with respect to the users’feedbacks. We focus on two main ideas: maximizing thefeedbacks returned by users and exploiting the knowledgeof future throughput. Note that, thanks to the exploitationof Big Data in network capacity modelling and prediction,throughput estimation becomes possible today and may go upto few seconds to the future [12]. Though, very few paperswere exploiting the knowledge of future throughput variation[13]–[15]. In [13], authors designed a QoE-driven optimizationproblem and proposed a cross-layer scheme to minimize thecost of capacity usage and to avoid video stalls under theassumption of a perfect throughput knowledge. The mainshortcoming of their approach is that it is only suited forclassical video streaming as it ignores important visual qualitymetrics related to adaptive streaming such as video resolution a r X i v : . [ c s . MM ] A ug nd bitrate distribution. Holding the same assumption, authorsin [14] proposed a proactive video content delivery algorithm,called NEWCAST, to adjust the quality of adaptive videostreaming over the future horizon. Their work was afterwardextended in [15] to make their algorithm well suited forunperfected throughput prediction.Our contribution in this paper is twofold: • We exploit the knowledge of future throughput variationsin order to solve the optimization problem addressed in[16] in a smoother and faster manner based on similarmathematical analysis than in [14]. • We design a closed-loop framework based on client-server interactions to learn the overall users’ perceptionsand to ﬁttingly optimize the quality of the streaming. Theperformance of our proposed framework is obtained usingMatlab and NS3 simulations under multi-user scenario.The paper is organized as follows: In Section II, we formulatethe single-user QoE-optimization problem. Then, in SectionIII, we discuss the strategy of the optimal solution and proposea heuristic that performs close to the optimal solution. InSection IV, we address the multi-user case and propose aclosed-loop based framework using neural networks. Then, inSection V, we evaluate the performance of this frameworkthrough some numerical results. Section VI concludes thepaper.II. S

INGLE - USER Q O E PROBLEM FORMULATION

A. The video streaming model

We model a video as a set of S segments (or shunks) ofequal duration in second. Each segment is composed of Nframes and is stored on the streaming server at different qualityrepresentations. Each representation designs a video encodingrate (hereinafter called bit-rate). Denote by b , b , . . . , b L theavailable video bit-rates where b i < b j f or i < j . We supposethat all the video frames are played with a deterministic rate,e.g 25 frames per second (fps). Denote by λ this frame rate.For each segment, the player indicates to the server the qual-ity needed for streaming it. Let b ( s ) be the bit-rate associatedto segment s and B = { b (1) , . . . , b ( S ) } be the set of bit-ratesassociated to all video segments. We assume that the videoplayback buffer is big enough to avoid eventual buffer overﬂowevents. We denote by B ( t k ) the number of segments that theplayback buffer contains at time t k . At the beginning of thestreaming session, a prefetching stage is introduced to avoidfuture buffer underﬂows; T seconds of video (correspondingto x segments) have to be completely appended to the bufferbefore starting playing the video. When there are no segmentsin the playback buffer, the video stops and a new prefetchingstage is introduced to append again T seconds of video beforepursuing the lecture. This event is, hereinafter, referred to asvideo stall.In our study, we exploit the knowledge of the user’sthroughput over a given horizon to the future H = [ t . . . t N ] .Before starting the session, we propose to set all the videosegments’ bit-rates, to be optimally streamed over that horizon . We denote by r ( t k ) the user’s estimated throughput at time t k , k ∈ [1 . . . N ] and by b t k the video bit-rate scheduled to bestreamed at that time. Note that b t k will only depend on thethroughput variation and the set of segments’ bit-rates B . Inthe following we denote it by b t k ( B , r ) .To model the dynamic of the playback buffer, we deﬁnetwo different phases: • the start-up/rebuffering phase: referred to as BaW-phase , where the media player only downloads the videowithout playing it • the playback phase: referred to as BaP-phase , where theplayer downloads and plays the video at the same time.Depending on the state of the buffer at each time of obser-vation t k , we deﬁne variables S BaP ( t k ) and τ BaW ( t k ) asfollows:1) if the player is on a BaP-phase, S BaP ( t k ) deﬁnes thetime at which that phase has started2) if the player is on a BaW-phase, S BaP ( t k ) deﬁnes thetime at which the next BaP-phase will start3) if the buffer is empty, τ BaW ( t k ) determines the durationof the following BaW-phase4) if the buffer is not empty, τ BaW ( t k ) takes zerowhich can be mathematically expressed as S BaP ( t k ) = max { S BaP ( t k − ) , δ ( B ( t k ) = 0) · ( t k + τ BaW ( t k )) } (1) τ BaW ( t k ) = δ ( B ( t k ) = 0) · { τ ; t k + τ (cid:88) t = t k λ · r ( t ) N · b t ( B , r ) = x } (2) where δ ( X ) (cid:40) if X = is true otherwise Hence the dynamic of the playback buffer can be written as B ( t k ) = { B ( t k − )+ λ · r ( t k ) N · b t k ( B , r ) − λ · δ ( t k ≥ S BaP ( t k )) } + , (3) where { x } + = max { x, } is used to ensure that the playbackbuffer occupancy cannot be negative. Figure 1. Illustration of the playback buffer BaW-BaP cycles. BaW:Buffer and Wait BaP:Buffer and Play . The QoE-Optimization Problem

The goal of bit-rate adaptation in video streaming servicesis to improve the users’ globally perceived quality of thevideo. In the literature, it is challenging to quantitativelydeﬁne the QoE as it encompasses many complex factors suchas the user’s mood, the time and the way one watches thevideo, the video context, etc. In this work, we use ﬁve of themost common key QoE metrics to express our objective QoEfunction.1) The average video quality φ , which is the average per-segment quality over all segments given by φ ( B ) = 1 S S (cid:88) s =1 b ( s ) . (4)2) The startup delay ratio, which is the proportion of timethat takes the ﬁrst BaW-phase before starting the video: φ ( B ) = τ BaW ( t ) T , (5)where T is the video length in seconds.3) The average number of video quality switching given by φ ( B ) = 1 S − S (cid:88) s =2 δ { b ( s ) (cid:54) = b ( s − } . (6)4) The number of video stalls: φ ( B ) = T (cid:88) k =1 δ { B ( t k ) = 0 } . (7)5) The rebuffering delay ratio, which is the proportion oftime that take all the rebuffering events: φ ( B ) = 1 T T (cid:88) k =1 δ { B ( t k ) = 0 } · τ BaW ( t k ) . (8)As the user’s preference on each of these QoE metrics maynot be the same, we assign to each metric φ i a weightingparameter ω i to adjust its impact on the global QoE variation.As done in previous works [16], we model our global QoEas a linear function of the weighted ﬁve aforementioned QoEmetrics, namely, Q ( B ) = (cid:88) i =1 ω i φ i ( B ) , (9)where ω ≥ and ω i ≤ , ∀ i ∈ , . . . , .Let W = ( ω , . . . , ω ) (cid:62) be the vector of weights and Φ( B ) = ( φ ( B ) , . . . , φ ( B )) (cid:62) be the vector of QoE metrics.If we assume that the user tolerates at most p stalls duringthe hole session, we end up formulating our single-user QoEoptimization problem as followsmax B Q ( B ) = W (cid:62) Φ( B ) (10)s.t.  (cid:80) Nk =0 λ · r ( t k ) N · b tk ( B ,r ) = Sφ ( B ) ≤ p ; p ∈ N ; where the ﬁrst constraint ensures that the whole video will bestreamed at the end of the future horizon. III. P ROPOSED S OLUTION FOR SINGLE - USER Q O E PROBLEM

The QoE optimization problem deﬁned in (10) is a com-binatorial problem with a very high complexity (NP hard).In [16], authors were addressing a similar problem, but theywere assuming an inaccurate throughput estimation, whichjustiﬁes their choice to adopt an MPC model to solve their QoEoptimization problem. The assumption of accurately knowingthe future with adaptive video streaming was explored in[14] where authors proposed an ascending bitrate strategy tooptimize the video delivery. In this paper, we characterize animportant propriety of the optimal strategy, which allows us topropose a heuristic approach that performs close to the optimalsolution.

A. Propriety of the optimal solution: Ascending bit-rate strat-egy per BaW-BaP cycle:

Deﬁnition 1.

We say a bit-rate strategy is ascending perBaW-BaP cycle , if the quality level of the segments increasesduring each BaW-BaP cycle of the streaming session.

Proposition 1.

Assume that there exists a solution B that sat-isﬁes the constraints in (10), then there exists an ascending bit-rate per BaW-BaP cycle solution B as that optimizes problem(10).Proof. We shall show that for any feasible strategy B thatsatisﬁes the constrains in (10), there exits an ascending bitrateper BaW-BaP cycle strategy B as such that Q ( B as ) ≥ Q ( B ) . Here, we distinguish two cases: • Case where the session is composed of one BaW-BaPcycle, i.e, no stall during the session:

Without loss ofgenerality, and for the sake of illustration, we assume thatwe can stream and play the video in a smooth way undera non-ascending bitrate strategy B . Then, there ∃ m ≤ n such that b ( m ) ≥ b ( n ) . Let t p and t q be the requestingtimes of b ( m ) and b ( n ) , respectively, as illustrated in Fig.2. If we switch between qualities of segments m and n ,then the buffer state will be more relaxed toward the stallconstraint since segment m will be streamed in a shortertime and, then, the following segments will be appendedsooner to the buffer, which may not induce buffer stalls.That said, if we reorder B in an ascendant way, the videowill not experience any stall. Let B as be the resulting setafter reordering B in an ascending way, then we have φ ( B as ) = φ ( B ) = 0 and φ ( B as ) = φ ( B ) = 0 . As we keep the same selected bitrates in B as as in B , theaverage per segment bitrate will not change, which gives φ ( B as ) = φ ( B ) . Since B as is an ascending strategy, the video sessionwill start with the lowest quality used by B . Hence, thetartup delay will be reduced by using B as compared to B . Therefore, φ ( B as ) ≤ φ ( B ) . Now, let L be the number of qualities selected by B . Thus,the number of quality switching under strategy B will beat least L . On the other hand, strategy B as will experienceexactly L − quality switching since the quality level ofsegments increases during the session. Therefore, we have φ ( B as ) ≤ φ ( B ) . All things considered, we have Q ( B as ) ≥ Q ( B ) . • Case where the session is composed of more than oneBaW-BaP cycle, i.e, one or more stalls during the session:

Here, we assume that, for a given horizon window, we canstream the video under a non-ascending bitrate strategy B with φ stall events over the session ( φ ≥ ). Undoubt-edly, reordering all the segments bitrates in an ascendingway will add more protection to the buffer against thestall constraint, which may reduce the number of stalls φ . However, it does not mean that the global QoE willincrease, because the duration of the stalls will changedepending on the new moments of their occurrence, theircorresponding requested qualities and the dynamic of theuser throughput. For these reasons, our ascending bitratestrategy will not work per a hole session. In an otherhand, we admit that, when a stall happens, the buffer statebecomes independent of its previous states before thestall, which makes all the BaW-BaP cycles independentfrom each other. Let’s write B = {B , · · · , B φ +1 } ,where B i denotes the set of bitrates used on the i th BaW-BaP cycle.If we apply our previous ascending strategy on each ofthe ( φ + 1 ) BaW-BaP cycles, we end up reducing theduration of all the BaW-phases (including the startup andthe rebuffering delays) and the global number of qualityswitching, while maintaining the same number of stallsand the same average quality.Let B as = {B as , · · · , B φ +1 as } , then we have Q ( B as ) ≥ Q ( B ) . This concludes the proof.

B. Algorithm for Optimal Solution

In this section, we describe the main steps to follow to buildan optimal solution of at most p of stalls during the streamingsession. The key idea of this algorithm is stall enforcement ;As we assume knowing the future throughput, we are ableto enforce stalls at any moment of the streaming session.Once we locate the stalls’ positions (at the level of witchsegment each stall should happen), we devide the session intomultiple BaW-BaP cycles then look for the optimal ascendingbit-rate strategy over each cycle. The optimal number of stalls Figure 2. Impact of bit-rate switching on the buffer state evolution. is obtained through an exhaustive research; we start computingthe optimal strategy with zero stalls, then with one stall up to p stalls. The stalls’ distribution is also obtained through anexhaustive research. In what follows, we describe the mainsteps to build an optimal ascending bit-rate strategy over oneBaW-BaP cycle:i) Find all the possible ascending bit-rate combinations of theBaW-phase that allow to build an ascending bit-rate strategyover the hole BaW-BaP cycle (step A and B in Fig.3).ii) For each BaW-phase combination, ﬁnd all the possibleascending strategies that satisfy the constraints of (10) (steps1, 2 and 3 in Fig.3).iii) For each strategy, compute the QoE metrics then apply thevector of weights W to ﬁnd the best solution.To ﬁnd all the possible ascending strategies, use the tree ofchoice described in [14]IV-3. Figure 3. Steps for building an increasing bit-rate strategy.

C. Heuristic for a Sub-optimal Solution

The key idea of our heuristic is twofold:1) The way we found the ascending strategy on a BaW-BaPcycle is different than the optimal strategy; Once we ﬁx thebit-rates combination of the BaW-phase, we progressivelyincrease the bit-rates of the BaP-phase starting by the end ofthe BaW-BaP cycle till reaching the point (segment) at thelevel of which a stall will happen if we keep increasing thequality ( th segment in step A-2 and th segment in stepA-3 of Fig.3). Given that the number of segments of theBaW-phase is small in general, it does not take much timeto ﬁnd all the possible ascending BaW-phase combinations,hich makes our heuristic fast (see Algorithm 1).2) rather than doing an exhaustive research on the numberof stalls we process as follows: we start ﬁnding the optimalstrategy with zero stalls, then, we check if the global QoE willincrease with one stall enforcement (we try all the possiblepositions). If it does, we try to enforce a second stall, if not,we stop and return the latest strategy. We keep increasing thenumber of stalls as described till reaching the maximum p or till the QoE function decreases. See Algorithm (2), where K i , i ≤ p denotes the position of the i th stall. Algorithm 1

MAESTRO: MAximizing qoE with aScendingbiTRate strategy over One-cycle Input: { b l } l ≤ L , c , S , W M = [ ] , b ( s ) = b ∀ s ∈ , . . . , x (cid:46) Initialization for l = 1 : L do (cid:46) For segments of BaW-state { b ( s ) P revious } ≤ s ≤ S = { b ( s ) } ≤ s ≤ S b ( s ) = b l ∀ s ∈ x + 1 , . . . , S check if it is possible to stream { b ( s ) } ≤ s ≤ S without stalls if no stall happens then s = x (cid:46) Start from end to back while s ≥ and No stall happens do b ( s ) = b l s = s − check if it is possible to stream { b ( s ) } ≤ s ≤ S without stalls end while if a stall happens then b ( s ) = b ( s ) P revious end if I l ,l = { b ( s ) } ≤ s ≤ S Compute Φ l ,l = ( φ , φ , φ for l = l + 1 : L do (cid:46) For segments of BaP-state { b ( s ) P revious } ≤ s ≤ S = { b ( s ) } ≤ s ≤ S s = S ; (cid:46) Start from end to back while s > x and No stall happens do b ( s ) = b l s = s − check whether it is possible to stream { b ( s ) } ≤ s ≤ S without stalls end while if a stall happens then b ( s ) = b ( s ) P revious end if I l ,l = { b ( s ) } ≤ s ≤ S Compute Φ l ,l = ( φ , φ , φ ) M = [ M ; Φ l ,l ] end for end if end for return Φ ∗ = Argmax M ( i, :) ,i ≥ ( M [ w , w , w ] T ) Algorithm 2

CASTLE: asCending bitrAte STrategy overmuLti-cycle sEssion Input: { b s } s ≤ L , c , x , S, p, W Bound

Inf = x , Bound

Sup = S , i=1 Previous QoE= QoE without stalls Previous

Φ = Φ without stalls for i ≤ p do for K i ∈ { x + 1 , .., S − x } do if there are some stalls already positioned then Bound

Inf = max { K j ; K j < K i } j ≤ i or x Bound

Sup = min { K j ; K j > K i } j ≤ i or S end if BaW-BaP

P reStall = { Bound

Inf ..K i − } BaW-BaP

P ostStall = { K i .. Bound

Sup } MAESTRO ([ w , w , w ] T , BaW-BaP

P reStall ) MAESTRO ([ w , w , w ] T , BaW-BaP

P ostStall ) compute Φ K i = ( φ , φ , φ , i, φ ) M = [ M ; Φ K i ] end for if max {{ M W T } j ≥ } > Previous QoE then K i =argmax j ≥ {{ M W T } j ≥ } Previous QoE= Resulting QoE

Φ = Φ K i i=i+1 else (cid:46) No need for stalls to increase the QoE return Φ ∗ = Previous Φ end if end for return Φ ∗ = Previous Φ IV. M

ULTI - USER Q O E OPTIMIZATION PROBLEM

A. Problem formulation

In this section, we extend the QoE optimization problem tothe multi-user case. We propose to ﬁnd the vector of weights W ∗ that maximizes the QoE among all users. The mainobjective is to maximize the users’ feedbacks on the videodelivery using a synthetic QoE dataset. The QoE problem ofthe multi-user case can be mathematically expressed as W ∗ ∈ argmax W (cid:40) U (cid:88) u =1 E r u {F r u ( W ) } (cid:41) (11)where r u is the throughput of user u and F r u ( W ) is hisfeedback on the quality he received after QoE optimization(10) using vector W . B. Practical solution: Closed-loop- based framework withusers’ feedbacks1) Framework design:

The multi-user QoE optimizationproblem requires to solve problem (10) for each user u ∈{ , . . . , U } , knowing the exact value of vector W ∗ that meetsall the users’ preferences. The challenge is then to combinesingle user QoE optimization with a QoE training mechanismin a closed-loop manner to progressively learn the value of ∗ . To do so, we develop two sub-frameworks and makethem interact together within a closed-loop based framework,one is for QoE optimization and the other is for QoE training(see Fig.4 and Fig.5); Figure 4. Closed-loop based framework for QoE-optimization.Figure 5. Framework interaction with video streaming entities.

2) QoE training tool:

To compute W ∗ , we use a simpleneural network [17], where the training samples are couples ofQoE metrics and user feedback. We deﬁne the training datasetas { (Φ ∗ r u , F r u ) } ≤ u ≤ U , where Φ ∗ r u is the vector of QoEmetrics delivered by (10) under throughput r u and vector W . F r u being the corresponding feedback.We deﬁne the activation function of the neural network as alinear function h W (Φ) = W (cid:62) Φ , where Φ is the input vectorand W is the vector of weights to learn (See Fig. 6). Figure 6. Architecture of the QoE trainer.

We make use of a mini-batch learning algorithm basedon the gradient descent. The goal behind using the gradientdescent is to minimize the average error rate between F r u andthe network output h W (Φ ∗ r u ) , u ∈ { · · · U } .Let Loss ( W , Φ ∗ r u , F r u ) be the half squared error corre-sponding to the u th training sample and Loss ( W , m ) be theaveraged error among m training samples, namely Loss ( W , Φ ∗ r u , F r u ) = 12 | h W (Φ ∗ r u ) − F r u | (12) Loss ( W , m ) = 1 m m (cid:88) u =1 Loss ( W , Φ ∗ r u , F r u ) (13)To reduce the average loss, the gradient descent updates thevector of weights W in a way that it moves oppositely to thedirection of the gradient vector ∇ Loss ( W , m ) . The algorithmstops when a predeﬁned minimum loss (cid:15) is reached or whenthe number of updating steps is above a given threshold T rs .(See Algorithm 3).The partial derivatives of Loss ( W , m ) in function of theweights ω k , k ≤ are given by ∂ Loss ( W , m ) ∂ω k = ∂∂ω k m m (cid:88) u =1 Loss ( W , Φ ∗ r u , F r u ) (14) = 1 m m (cid:88) u =1 Φ ∗ r u ,k ( W (cid:62) Φ ∗ r u − F r u ) Algorithm 3

The mini-batch Gradient descent Input: { (Φ r ∗ , F r ) , .., (Φ r m ∗ , F r m ) } , (cid:15), µ, T rs , [ α min , α max ] GoodConvergence =0 ; SlowConvergence =0 ; Divergence = 0;Set α in [ α min , α max ] ; Set W very small; repeat repeat W = W − α. ∇ Loss ( W , m ) until Loss ( W , m ) ≤ (cid:15) or T rs iterations are done if Loss ( W , m ) ≤ (cid:15) then GoodConvergence =1 else if Loss ( W , m ) is decreasing then SlowConvergence = 1 increase ( α min ) Set α in [ α min , α max ] else Divergence =1 decrease( α max ) Set α in [ α min , α max ] end if end if until GoodConvergence or ( α max − α min ) ≤ µ return W ∗ = W V. N

UMERICAL RESULTS

A. Simulation environment

We evaluate the performance of the proposed frameworkthrough extensive simulations using NS3 and Matlab. NS3 wasused to generate standard-compliant correlated throughputs. Toget many throughput samples, we performed extensive simula-tions of an LTE network by varying the mobility of users each umber of macro cells Number of UEs per cell eNb Tx Power

46 dBm eNb noise ﬁgure

UE noise ﬁgure

Pathloss model

COST 231

MAC scheduler

Proportional fair 50 RBs

Fading model

Pedestrian

Transmission model

MIMO Transmit diversity

Mobility model

RandomWalk2dMobilityModel

Velocity of users

Uniform [5,16] m/s

EPS bearer

NGBR-VIDEO-TCP-DEFAULT

Fading model

Pedestrian

Simulation length

70 sTable INS3

SIMULATION SETTING PARAMETERS . Window Size

70 s

Throughput Time Slot

Video Length

30 s

Segment Length Video frame rate

30 fps

Playback cache Bit-rate levels Mbps [0.4 0.75 1 2.5 4.5]

Maximum number of stalls (p)

ATLAB SIMULATION SETTING PARAMETERS . time. We put all NS3 parameter settings in Table I. The QoEoptimization sub-framework and the QoE trainer were bothdeveloped using Matlab. As in real world, we consider users’feedbacks as scores rated from 1 to 5. When a quality Φ r ∗ isdelivered to a user, we look through the predeﬁned syntheticQoE dataset to ﬁnd the score it may give. In the dataset, we putall the possible values of vector Φ r ∗ in a speciﬁc priority order,i.e., | w satll | >> | w rebuffering | >> | w average − quality | >> | w startup | >> | w switching | . These vectors were then groupedin classes. To each class we associated a MOS and a speciﬁcdistribution of scores. When Φ r ∗ is delivered, we determinethe class to which it belongs. Then, according to that class werandomly generate a score based on the distribution of scoresin the dataset. Note that the throughput samples used at thelevel of the QoE optimization sub-framework were randomlyselected (according to a Uniform distribution) among throughput samples generated with NS3. All Matlab parametersettings are listed in Table II. B. Performance results

The performance evaluation of the closed-loop based frame-work allows us to show that (i) the learning converges ulti-mately to a steady state, in which the learning output is aquasi-constant vector W ∗ , and that (ii), more importantly, thisvector W ∗ achieves the highest QoE compared to the othervectors computed throughout the learning process.In Fig. 9, we show the evolution of the mean squarevariation of vector W during the learning process for differentvalues of the mini-batch size. Results show that for all cases,the variation tends to zero, although the decrease is slow insome cases (case of 5 and 50 scores). A fast convergence ishowever noticed in the case of 10 scores. The difference in theconvergence time is actually due to the random character ofthe throughput selection and the scores generation. In a second Figure 7. Synthetic-dataset for score generation.

QoE class arrangement M O S Figure 8. Synthetic MOS as function of the QoE class arrangement. step, we were comparing the ﬁnal outputs W ∗ . We noticedthat they were not exactly the same. Hence, we computedthe MOS when each of the previous updated values of W was applied with the QoE optimization sub-framework under randomly selected throughputs. Fig. 10 shows that for thefour mini-batch sizes, the MOS experiences some ﬂuctuationswith the ﬁrst values of W . Then, when it tends to the valuesobtained at the steady state, it converges to the highest MOSvalue (around 4.8 for the four cases). These results offerhope that the proposed closed-loop based framework can bedesigned around QoE optimization for video adaptation anddelivery in real-world environment.VI. CONCLUSION

In this paper, we have addressed a QoE optimizationproblem with machine learning to optimize the quality ofthe delivered video by ﬁtting the real proﬁles of the users.We have proposed a closed-loop framework based on theusers’ feedbacks to learn their corresponding QoE functionand to proceed to their QoE optimization. By using a syntheticQoE dataset, we have shown the efﬁciency of the proposedclosed-loop system. Indeed, the QoE function learned at thesteady state ensures a high quality delivery for the majorityof users. These promising results allow us to gain insighton how QoE optimization problem can be handled in a

20 40 60 80 100

Number of updates M ean s qua r ed v a r i a t i on o f W mini-batch size = 5 scores Number of updates M ean s qua r ed v a r i a t i on o f W mini-batch size = 10 scores Number of updates M ean s qua r ed v a r i a t i on o f W mini-batch size = 50 scores Number of updates M ean s qua r ed v a r i a t i on o f W mini-batch size = 100 scores Figure 9. The mean square variation of vector W during the learning process. W W W W W W in the learning order M O S mini-batch size = 5 scores W W w W W W in the learning order M O S mini-batch size = 10 scores W W W W W W in the learning order M O S mini-batch size = 50 scores W W W W W W in the learning order M O S mini-batch size = 100 scores Figure 10. The MOS of the QoE-optimization sub-framework using theupdated values of vector W . heterogeneous population. As a future step, real scores onreal video streaming will be collected in order to study therobustness of the proposed solution.A CKNOWLEDGEMENT

This research is partially supported by NSF grants CNS-1720230, CNS-1544782, and SES-1541164.R

EFERENCES[1]

Cisco Visual Networking Index: Forecast and Methodology, 2015-2020

Network Protocols& Algorithms , 2016.[3] M. S. Mushtaq, B. Augustin, and A. Mellouk, “Empirical study basedon machine learning approach to assess the qos/qoe correlation,” in

Networks and Optical Communications (NOC), 2012 17th EuropeanConference on , 2012. [4] J. D. Vriendt, D. D. Vleeschauwer, and D. C. Robinson, “QoE model forvideo delivered over an LTE network using HTTP adaptive streaming,”

Bell Labs Technical Journal , 2014.[5] A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica, and H. Zhang,“A quest for an internet video quality-of-experience metric,” in

Proceed-ings of the 11th ACM Workshop on Hot Topics in Networks , ACM, 2012.[6] L. Amour, S. Souihi, S. Hoceini, and A. Mellouk, “A HierarchicalClassiﬁcation Model of QoE Inﬂuence Factors,” , pp. 225–238, 2015.[7] F. Dobrian, V. Sekar, A. Awan, I. Stoica, D. Joseph, A. Ganjam,J. Zhan, and H. Zhang, “Understanding the impact of video quality onuser engagement,”

ACM SIGCOMM Computer Communication Review ,vol. 41, no. 4, pp. 362–373, 2011.[8]

Youtube: Measure video ad performance .https://support.google.com/youtube/answer/2375431.[9] A. Balachandran, V. Sekar, A. Akella, S. Seshan, I. Stoica, and H. Zhang,“Developing a predictive model of quality of experience for internetvideo,”

SIGCOMM Comput. Commun. Rev. , 2013.[10] A. Testolin, M. Zanforlin, M. D. F. D. Grazia, D. Munaretto, A. Zanella,M. Zorzi, and M. Zorzi, “A machine learning approach to qoe-basedvideo admission control and resource allocation in wireless systems,”in

Ad Hoc Networking Workshop (MED-HOC-NET), 2014 13th AnnualMediterranean , 2014.[11] Christos G. Bampis, and Alan C. Bovik, “Learning to predict streamingvideo qoe: Distortions, rebuffering and memory,” 2017. Available athttps://arxiv.org/abs/1703.00633.[12] U. Shevade, Y.-C. Chen, L. Qiu, Y. Zhang, V. Chandar, M. K. Han,H. H. Song, and Y. Seung, “Enabling high-bandwidth vehicular contentdistribution,” in

ACM CoNEXT, Philadelphia, USA. , 2010.[13] Z. Lu and G. de Veciana, “Optimizing stored video delivery for mobilenetworks: The value of knowing the future,” in

INFOCOM, 2013Proceedings IEEE , pp. 2706–2714, April 2013.[14] I. Triki, R. El-Azouzi, and M. Haddad, “NEWCAST: AnticipatingResource Management and QoE Provisioning for Mobile Video Stream-ing,” in , June 2016.[15] I. Triki, R. E. Azouzi, and M. Haddad, “Anticipating Resource Manage-ment and QoE for Mobile Video Streaming under Imperfect Prediction,”in

IEEE International Symposium on Multimedia, ISM, San Jose, CA,USA, December 11-13, 2016 , pp. 93–98, 2016.[16] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli, “A Control-Theoretic Ap-proach for Dynamic Adaptive Video Streaming over HTTP,”

SIGCOMMComput. Commun. Rev. , pp. 325–338, 2015.[17] G. D. MAGOULAS and M. N. VRAHATIS, “Adaptive algorithmsfor neural network supervised learning: A deterministic optimizationapproach,”