[PDF] A Study on Impacts of Multiple Factors on Video Qualify of Experience

Abstract

HTTP Adaptive Streaming (HAS) has become a cost-effective means for multimedia delivery nowadays. However, how the quality of experience (QoE) is jointly affected by 1) varying perceptual quality and 2) interruptions is not well-understood. In this paper, we present the first attempt to quantitatively quantify the relative impacts of these factors on the QoE of streaming sessions. To achieve this purpose, we first model the impacts of the factors using histograms, which represent the frequency distributions of the individual factors in a session. By using a large dataset, various insights into the relative impacts of these factors are then provided, serving as suggestions to improve the QoE of streaming sessions.

Full PDF

AA Study on Impacts of Multiple Factors onVideo Qualify of Experience

Huyen T. T. Tran , Nam Pham Ngoc , and Truong Cong Thang The University of Aizu, Aizuwakamatsu, Japan VinUniversity, Vietnam

Abstract

HTTP Adaptive Streaming (HAS) has become a cost eﬀec-tive means for multimedia delivery nowadays. However,how the quality of experience (QoE) is jointly aﬀected by1) varying perceptual quality and 2) interruptions is notwell-understood yet. In this paper, we present the ﬁrstattempt to quantitatively quantify the relative impacts ofthese factors on the QoE of streaming sessions. To achievethis purpose, we ﬁrst model the impacts of the factors usinghistograms, which represent the frequency distributions ofthe individual factors in a session. By using a large dataset,various insights into the relative impacts of these factorsare then provided, serving as suggestions to improve theQoE of streaming sessions.

HTTP Adaptive Streaming (HAS) has become a popularsolution for multimedia delivery nowadays. In HAS, avideo is encoded into multiple versions with diﬀerent bi-trates (and so diﬀerent quality values). Each version isfurther divided into short segments [1]. Based on networkstatuses, suitable versions of individual segments are se-lected and delivered to clients so that the highest possiblequality of experience (QoE) can be provided to users. To-wards eﬀective version selections, a main challenge is toquantify the impacts of factors on the QoE of streamingsessions.Previous studies have investigated, both qualitativelyand quantitatively, diﬀerent factors aﬀecting the QoE of HAS sessions [2–4]. In general, there are three key fac-tors, namely initial delay , varying perceptual quality , and interruptions as shown in Fig. 1. The initial delay refersto the waiting time before watching a video [5]. Varyingperceptual quality refers to quality changes of segmentsin a session as a consequence of network bandwidth ﬂuc-tuations. This factor could be further divided into twosub-factors. The ﬁrst, called quality levels , refers to con-tributions of high and low segment quality levels on theQoE. The second, called quality variations , refers to im-pacts of segment quality switches. Interruptions refer toinstances of rebuﬀering while watching a video [5].In the literature, the impact of varying perceptual qual-ity was modeled using some statistics such as the numberof switches [6, 7], the average [4, 6, 8, 9], the median [10],the minimum [10], and the standard deviation of segmentquality values [6]. As for interruptions, their impact wasmodeled using some statistics such as the number of in-terruptions [11, 12], the average [11], the maximum [11],and the sum [12, 13] of interruption durations.Diﬀerent from the two above factors, the impact of theinitial delay was mostly found to be small [14–16]. Inaddition, as the initial delay appears only once at the be-ginning of a session, it is simple to individually model theimpact of this factor using a function of the initial delayduration. This could be a linear function [12], an exponen-tial function [13], or a logarithmic function [5, 17]. Thus,in this study, we mainly focus on the two more importantfactors of varying perceptual quality and interruptions.Though previous studies [5, 11–13, 18] have revealedsome general behaviors of each individual factor, insightsinto relative impacts of diﬀerent events (i.e., switching and1 a r X i v : . [ c s . MM ] J un igure 1: A taxonomy of key factorsinterruptions) during a session are very limited. The un-derstanding of such relative impacts can help make moreeﬀective adaptation decisions. For example, instead ofabrupt down-switching to a very low quality value, remain-ing a moderate quality value with a very short interruptioncould bring a higher QoE. However, to the best of ourknowledge, there are only two studies in [3, 19] that couldgive some ﬁndings on this issue. In particular, the authorsin [19] show that a down-switch can result in a comparableimpact to an interruption. Meanwhile, an extensive studywith YouTube users in [3] indicates that an interruptionhas three times the impact of a quality switch. Both thestudies are not clear how diﬀerent degrees of switches canbe compared to diﬀerent interruption durations, and socausing the confusion between the conclusions.In this paper, our aim is to quantitatively investigate therelative impacts of diﬀerent events of quality switchingand interruption. In particular, we focus on answeringfour important questions below.• How diﬀerent are the impacts of diﬀerent segmentquality levels?• What are the impacts of quality switching types? Doswitches with higher switching amplitudes alwayscause more negative impacts? Does the starting qual-ity of switches have any inﬂuence on the QoE?• How diﬀerent are the eﬀects of diﬀerent interruptiondurations?• What are the relative impacts between quality switch-ing types and interruption types? Do interruptions always result in more negative eﬀects than qualityswitches?For this purpose, we ﬁrst extend our QoE model usinghistograms that has been presented in [17]. In particular,switches are classiﬁed into diﬀerent types based on notonly their switching amplitudes (i.e., diﬀerences of seg-ment quality before and after switching) but also startingquality values (i.e., segment quality values before switch-ing). Then, by using two more datasets, the reliability ofthe model is conﬁrmed. Finally, based on an analysis ofthe model parameters, the impact of each switching or in-terruption type could be quantiﬁed. Also, various insightsinto their relative impacts are provided, which help answerthe above questions.The remainder of the paper is organized as follows. Sec-tion 2 presents the related work. The quality model, whichaims at quantifying the relative impacts of diﬀerent switch-ing and interrupting events, is described in Section 3. Theperformance of the proposed model is analyzed in Sec-tion 4. Section 5 draws a set of insights into the impactsof the factors on the QoE. Finally, Section 6 concludes thepaper. In this section, we ﬁrst present related work on the impactof each single factor. Then, relative impacts of factorsfound in previous studies are described.

Aforementioned, the factor of varying perceptual qual-ity could be divided into two sub-factors of quality levelsand quality variations. The contributions of quality levelswere investigated in many existing studies [4, 6, 8, 9]. Itwas found that, given a quality level, its contribution de-pends on its total presence time during a session [12, 20].To model these contributions, some statistics of segmentquality values such as the average [4, 6, 8, 9], the me-dian [10], and the weighted sum [12] can be used.There have been some ﬁndings on the impacts of qualityvariations presented in [3, 20–22]. The experiment resultsin [3, 22] show that users expect the number of qualityswitches (or quality switching frequency) as low as pos-sible. In contrast, the ﬁnding in [3] is that users prefer2onstant quality to varying quality. Even the impact offrequent quality switching may be more than four timeshigher than that with no quality change. Meanwhile, theimpact of the number of quality switches is found to benegligible in [20, 21]. In particular, no signiﬁcant dif-ference was observed between high and low numbers ofquality switches. The disagreement in the conclusionsmay stem from the fact that the use of the number ofquality switches equitably treats all quality switch typeswhile they may cause signiﬁcantly diﬀerent impacts. Inother words, the conclusions are speciﬁc to only the cor-responding experiment results in the original papers, butnot general in practice.Some in-depth investigations on the impact of qual-ity variations were conducted with classifying qualityswitches [12, 19–21]. It is found that the impactsof up-switches are much smaller than those of down-switches [12]. In addition, abrupt up-switches may notbe worse than smooth up-switches [19, 21]. Meanwhile,down-switches with larger switching amplitudes causemore negative impacts [20, 21]. Similar conclusions arealso given in [23], where the authors quantify the impactsof switching types with diﬀerent switching amplitudes.However, these ﬁndings are limited because it cannot dif-ferentiate switches having diﬀerent starting quality values(e.g., a switch from 5 MOS to 3 MOS is in fact not thesame as a switch from 3 MOS to 1 MOS). So far, therehas been no study on the impact of starting quality valuesof switches.In most existing studies, the impact of quality variationsare modeled using some statistics such as the number ofswitches [6, 7], the minimum [10], and the standard de-viation of segment quality values [6]. In our previousstudy [17], it is found that the use of histograms of switch-ing amplitudes is very eﬀective to model the impact of thissub-factor.With regard to interruptions, the authors in [3] showedthat users prefer a single interruption to multiple interrup-tions. The impact of a single interruption is modeled as anexponential function of its duration in [5]. To model theimpacts of multiple interruptions, several previous studiesused the number of interruptions [11,12], the average [11],the maximum [11], and the sum [12, 13] of interruptiondurations.

Recently, many QoE models have been proposed for HAS.However only a few are multi-factor models [12, 13]. Thestudies in [2,11,24,25] modeled the contributions of qual-ity levels and interruptions. However, the impact of qualityvariations was not considered. In [13], the authors pro-posed a model taking into account the impacts of qualityvariations and interruptions. Yet, this model does notinclude the impacts of quality levels.To the best of our knowledge, the authors in [12] pro-posed the ﬁrst QoE model taking into account all the three(sub-)factors of quality levels, quality variations, and in-terruptions. This model is built in two steps, which are 1)separately modeling and then 2) combining the impacts offactors. In particular, the impact of a quality level is mod-eled by the total presence time of that quality level. For theimpacts of quality variations, the authors used the meansquare of down-switching amplitudes. The impact of in-terruptions is modeled using the number of interruptionsand the sum of interruption durations. In the latest stage ofstandardization ITU-T P.1203 [26], a multi-factor modelis recommended. In this model, the impacts of quality lev-els, quality variations, and interruptions are modeled usingvarious statistics such as the diﬀerence between the maxi-mum and minimum segment quality values, the weightedsum of segment quality values, and the number of inter-ruptions.Although the factors were modeled in the above stud-ies, the relative impacts of these factors have not beenquantiﬁed. In the literature, there are very few studies in-vestigating this issue. In [19], the ﬁnding is that a qualityswitch can cause a comparable impact as an interruption.However, the switching and interruption types consideredin that study are limited. In particular, only two speciﬁcpairs consisting of two switching types and two interrup-tion types were compared in that study. An study in [3]revealed that an interruption has three times the impact ofa switch. However, the switching and interruption typesare not clearly deﬁned.In this study, we, for the ﬁrst time, attempt to fullyquantify the impacts of diﬀerent switching and interrup-tion types. Based on obtained results, a set of insights intothe relative impacts of the factors are provided.3

Proposed QoE model

In this section, we ﬁrst present two main components ofthe proposed model. The ﬁrst, denoted Q PQ , representsthe impact of varying perceptual quality. The second,denoted D I R , represents the impact of interruptions. Then,a combination of these components to predict the QoE isgiven.

To model the impact of varying perceptual quality, we uti-lize two histograms of two sub-factors, i.e., quality levelsand quality variations.In particular, each segment is represented by a qualityvalue (i.e., MOS), which can be obtained by subjectivetests [6,17,20] or estimated from encoding parameters [10,18]. The range of segment quality values is split into N intervals { I Qn | ≤ n ≤ N } , which are given by I Qn = [ n − ϑ L n , n + ϑ U n ) , (1)where ϑ U n and ϑ L n are parameters to deﬁne the intervals’widths. Each interval I Qn corresponds to a segment qualitybin B Qn . If a segment quality value is in interval I Qn , itbelongs to bin B Qn .In the proposed model, each quality switch is repre-sented by a starting quality value and a switching ampli-tude. To deﬁne switching amplitudes, we use the conceptof “quality gradient” , which is given by ∇ Q = ∂ Q / ∂ t , (2)where ∂ Q is the change of segment quality values in timeinterval ∂ t .Currently, we use the quality value of the segment justbefore switching to represent the starting quality value, andquality changes between two segments right before andafter switching to represent the instant gradient value ateach switch. A positive (negative) gradient value indicatesan up-switch (down-switch). The range of gradient valuesis split into ( × M + ) intervals { I Vj | − M ≤ j ≤ M } ,which are deﬁned by I Vj = [ j − θ L j , j + θ U j ) , (3)where the parameters θ U j and θ L j deﬁne the intervals’widths. Each quality switching bin { B Vi , j ( ≤ i ≤ N , − M ≤ j ≤ M )} is deﬁned by two intervals I Qi and I Vj . A switchbelongs to bin B Vi , j if its starting quality value is in interval I Qi and its switching amplitude is in interval I Vj .Let F Qn denote the normalized frequency of seg-ment quality values in bin B Qn ( ≤ n ≤ N ) . F Vi , j denotes the normalized frequency of switches in bin B Vi , j ( ≤ i ≤ N , − M ≤ j ≤ M ) . Note that F Qn is normal-ized by the number of segments. F Vi , j is normalized by thetotal number of switches and interruptions. The perceptualquality Q PQ is modeled by Q PQ = N (cid:213) n = α n F Qn − N (cid:213) i = M (cid:213) j = − M β i , j F Vi , j , (4)where α n and β i , j are respectively the weights of bin B Qn and bin B Vi , j .In this study, we use the Absolute Category Rat-ing method with a 5-grade scale, which is widelyused for quality assessments of streaming sessions inHAS [19, 21, 26, 27]. So we currently split the rangeof segment quality values into N = ( M = ) with ϑ U n = ϑ L n = θ L j = θ U j = . { I Vj | j > } contain up-switches, interval I V represents quality maintaining, andintervals { I Vj | j < } include down-switches. As noted inthe previous study of [23], the impacts of down-switchesare signiﬁcant while the impacts of non-negative switches(including up-switches and quality maintaining) are neg-ligible. So, we simplify the proposed model by groupingall the bins of non-negative switches into one bin (denotedby B um ). The normalized frequency F um of this bin isgiven by F um = N (cid:213) i = M (cid:213) j = F Vi , j . (5)Then, the simpliﬁed perceptual quality model is given by Q PQ = N (cid:213) n = α n F Qn − N (cid:213) i = − (cid:213) j = − M β i , j F Vi , j − β um F um , (6)where β um is the weight of bin B um . With the simpliﬁedmodel, the number of model parameters (i.e., weights) canbe reduced by approximately a half.4 a) (b) Figure 2: Distribution of interruptions: (a) left, the num-bers of interruptions per session; (b) right, interruptiondurations.Table 1: Intervals of Interruption Durations.

Interval (s) I I I I I I I I I I I I ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , . ] ( . , + ∞) Table 2: Features of Source Videos

Video Framerate (fps) Type Content Motionactivity Spatialcomplexity

Video

24 Animation Slow move-ments ofcharacters Low Complex

Video

24 Animation A ﬁght be-tween twocharacters High Simple

Video

24 News A reporter in aweather fore-cast Medium Complex

Video

24 Sport A soccermatch High Complex

To investigate the impact of interruptions, a histogram ofthis factor is deﬁned as follows. Each interruption is repre-sented by its duration. The range of interruption durationsis divided into L intervals { I Il | ≤ l ≤ L } correspondingto L interruption bins { B Il | ≤ l ≤ L } . Let F Il be thenormalized frequency of interruptions in bin B Il . Notethat F Il is normalized by the total number of switches andinterruptions. The impact of interruptions D I R is modeledby D I R = L (cid:213) l = γ l F Il , (7)where γ l is the weight of bin B Il .To deﬁne the representative intervals I Il , a series of video streaming experiments was performed on a realtestbed. In these experiments, we recorded totally 120streaming sessions with at least one interruption per ses-sion. The distributions of the number of interruptions andinterruption durations are shown in Fig. 2. We can seethat, the number of interruptions per session is typicallyfrom 1 to 6. About 40% of the sessions contains onlyone interruption. Besides, about 13% of the observed in-terruptions has durations shorter than 0.25 seconds, andabout 10% longer than 3 seconds. Based on these obser-vations, we split the range of interruption durations into L = I I includes interruption durations longer than 3seconds. Finally, the proposed model, which integrates the abovecomponents to predict the QoE of streaming sessions, isgiven by

QoE pred = max ( Q PQ − D I R , ) . (8)This model has totally 22 model parameters which enablecomparisons of impacts between switching and interrup-tion types presented in Section 5. To evaluate the prediction performance of the proposedmodel, we combine our two databases presented in [17,28]. The combination database consists of 288 sessions,of which 168 contain only one single factor (i.e., eithervarying perceptual quality or interruption) and 120 includeboth the factors. From this database, we choose 50 pairs oftraining and test sets. In particular, for each pair, the test setconsists of 90 sessions randomly selected from the multi-factor sessions. The corresponding training set consistsof 198 remaining sessions (i.e., 168 single-factor sessionsand 30 remaining multi-factor sessions). The training setis to obtain model parameters using least squares ﬁtting.The test set is to evaluate prediction performances. Theresults presented in the following are the average valuesover the 50 test sets.Table 3 shows the performance of the proposed modelin terms of Person Correlation Coeﬃcient (PCC) and Root5able 3: Performance of the Proposed QoE Model

Model Training set Test set

PCC RMSE PCC RMSE

Proposed

Similar to [12], Tables 6, 7, and 8 show the values ofthe model parameters { α n , β i , j , γ l } corresponding to thetest set which achieves the highest PCC value (i.e., 0.96)among the 50 test sets. It is interesting to see that theweights α n and γ n increase w.r.t. their indexes of n and l , respectively. The weight β um of the non-negativeswitching bin B um is equal to zero. For down-switchingbins { B Vi , j | j < } , their weights { β i , j } increase when theswitching amplitudes get larger values or the starting qual-ity values become lower. More detailed discussions aboutthese weights will be made in Section 5. In this part, a comparison of prediction performance be-tween the proposed model and four existing models isconducted over two databases. The ﬁrst is our database,where prediction performances are the average values overthe 50 test sets mentioned in Subsection 4. The second isan open database (called

VL04 ) of the ITU-T P.1203 stan-dardization procedure (P.NATS) [29, 30]. This databaseconsists of sixty 1-minute long sessions generated fromthree diﬀerent videos. The performances of the modelsare calculated over all these sixty sessions.A description of the proposed model and the fourexisting models, denoted by

Guo’s [10],

Vriendt’s [6],

Liu’s [12], and

P.1203 [26,30–32], is presented in Table 4.It can be seen that these models use various statistics tomodel the impacts of the factors. The models

Guo’s and

Vriendt’s only take into account the impacts of varyingperceptual quality. Meanwhile, the remaining models in-clude the impacts of both varying perceptual quality andinterruptions.According to Recommendations ITU-T P.1401 [33] andITU-T P.1203 [26], we conducted a compensation for testcondition diﬀerences between models. In particular, eachreference model is re-implemented by using parametersstated in the original study. Then, a ﬁrst-order linearregression is performed to adjust the predicted QoE values.Finally, after the adjustment, the prediction performanceis calculated. The coeﬃcients (i.e., slopes and intercepts)of the regression and performance corresponding to eachdatabase are presented in Table 5.It can be seen that, with using our database, the proposedmodel achieves the best prediction performance (i.e., thehighest PCC value and the lowest RMSE value). Specif-ically, the PCC and RMSE values of the proposed modelare respectively 0.95 and 0.30 MOS. It is clear that themodels

Guo’s and

Vriendt’s fail to predict the QoE val-ues of multi-factor sessions since they do not include theimpacts of interruptions. Thus, their PCC values are low(i.e., ≤ . ≥ .

58 MOS). For the models

Liu’s and

P.1203 , their per-formances are signiﬁcantly lower than that of the proposedmodel. In particular, the PCC and RMSE values are re-spectively 0.78 and 0.56 MOS for the model

Liu’s , and0.85 and 0.42 MOS for the model

P.1203 . A possibleexplanation for this result is that these models use somestatistics such as the number of switches and the numberof interruptions to model the impacts of varying percep-tual quality and interruptions. However, these statisticscan not fully reﬂect switches and interruptions occurringin a session as they can not distinguish diﬀerent switchingtypes and interruption types. Meanwhile, thanks to theuse of the histograms, the proposed model can diﬀerenti-ate diﬀerent switching and interruption types, and so canmore eﬀectively model the impacts of the factors.

In this subsection, we quantitatively analyze the impactsof the factors by discussing the weights of the bins in the6able 4: Description of the proposed and existing models

Models Statistics used to represent the impacts of factors

Varying Perceptual Quality InterruptionsGuo’s [10] Median segment quality valueMinimum segment quality value —

Vriendt’s [6] Number of switchesAverage and standard deviation of segment quality values —

Liu’s [12] Weighted sum of segment quality valuesAverage of the squares of down-switching amplitudes Sum of interruption durationsNumber of interruptions

P.1203 [26] Number of switchesNumber of quality direction changesLongest switching durationFirst and ﬁfth percentile of segment quality valuesAverage of segment quality values in each intervalWeighted sum of segment quality valuesDiﬀerence between the maximum and minimum of segmentquality values Number of interruptionsWeighted sum of interruption durationsAverage time distance between interruptionsSum of interruption durationsTime distance between the last interruption andthe end of session

Proposed

Histogram of segment quality valuesHistogram of switching amplitudes Histogram of interruptions

Table 5: Adjustment Coeﬃcients and Performances of the Proposed Models and Existing Models

Model Our database VL04

Coeﬃcients Performance Coeﬃcients Performance

Slope Intercept PCC RMSE Slope Intercept PCC RMSE

Guo’s

Vriendt’s

Liu’s

P.1203

Proposed — — 0.95 0.30 0.79 0.82 0.90 0.39

Table 6: Parameters of the Segment Quality Component α α α α α α n of the segmentquality bins B Qn shown in Table 6. It can be seen that theseweights increase with their indexes n . This implies that ahigher quality level has bigger contribution in the QoE ofsessions. In other words, the contribution of the highestquality level is biggest, which is similar to the ﬁndingin [20]. Note that, only two quality levels are considered Table 7: Parameters of the Quality Switching Component β i , j Starting quality value (i)

Qualityswitchingamplitude(j)

Non-neg.( β um ) -1 -2 -3 -4 in [20]. It is interesting to see that most of these weightsare similar to the midpoints of the corresponding intervals7able 8: Parameters of the Interruption Component γ γ γ γ γ γ I Qn , except for the weight α . The reason may be becauseit is, in fact, diﬃcult to achieve 5 MOS even at perfectquality [5, 34].Next, we quantitatively investigate the impacts of qual-ity variations. Table 7 reports the weights β i , j of thequality switching bins B Vi , j . It is found that the impacts ofswitches depend not only on switching amplitudes but alsoon starting quality values. With the same starting qual-ity value, the larger the switching amplitude is, the moreserious the impact becomes. Also, given a switching am-plitude, the lower the starting quality value is, the morenegative the impact is. This means, when having the sameswitching amplitude, a down-switch in low quality rangesis more severe than in high quality ranges. For example,as β , − > β , − and β , − > β , − , a switch from 3 MOSto 1 MOS has more negative impact than that from 4 MOSto 2 MOS, which is in turn worse than from 5 MOS to3 MOS.Although the switching amplitude is smaller, β , − ishigher than β , − and β , − . This reveals that switcheshaving larger switching amplitudes may not cause morenegative impacts. From Table 7, it is also observed thatthe weights of β , − , β , − , β , − , and β , − (in increasingorder) are very large (up to 24.76). So it is recommendedto avoid switching to very low quality levels (i.e., around1 ∼ β um turns out to be zero. This re-conﬁrms the ﬁnding in [23] that the impacts of up-switchesand quality remaining are negligible. Therefore, it is un-necessary to classify up-switches (like down-switches).This ﬁnding is in line with those obtained in [19, 21] thatabrupt up-switches may not be worse than smooth up-switches. Even users prefer switching up to and thenmaintaining a high quality level to gradual switching. Inaddition, this result implies that switching to higher qual-ity levels is preferred to remaining at low quality levels,which is in agreement with [21].In contrast, a study in [3] shows that frequent quality in-creasing can lead to signiﬁcantly larger impacts comparedto no quality change. This observation can be obtained Figure 3: An example of four sessions with diﬀerent qual-ity variations.when comparing a session having frequent up-switches inlow quality ranges with a session having quality ﬁxed athigher levels. However, this conclusion may not be cor-rect for up-switching in high quality ranges and qualityremaining at low levels. Therefore, to avoid confusions,ﬁndings should be speciﬁc to switching types.To better understand some statistics of segment qualityvalues, Fig. 3 shows some cases of sessions with diﬀerentquality variations. All the cases have the same statis-tics including the average, the median, the maximum, theminimum, and the standard deviation of segment qualityvalues. However, it can see that quality variations are verydiﬀerent in these cases. So these statistics are not ableto fully reﬂect quality variations occurring in a session.Although the number of switches in the case γ l of the interruption bins B Il shown in Table 8. We can see that, the larger the index l is, the higher the weight γ l becomes. This means thatthe impact of interruptions increases with their durations.Especially, the weight γ is zero, implying that users aregenerally tolerant of interruptions having durations lessthan or equal to 0.25 seconds. For interruptions with8onger durations, their impacts are much more negative.In a comparison between the weights β i , i and γ l , it canalso be observed that the weights γ and γ are very large(45.58 and 50.65), even larger than the weights β i , j of anyquality switching bins. This indicates that an interruptionwith duration longer than 2 seconds is more annoying tousers than any down-switches. Therefore, avoiding suchinterruptions should be of the highest priority, possibly atthe cost of abrupt switches.Meanwhile, the weight γ is close to the weight β , − (i.e., 8.42 and 7.89). This result is in-line with [19] thata down-switch may cause a comparable impact as an in-terruption. Moreover, it can be seen that the weight γ isconsiderably lower than the weights β , − , β , − , and β , − .This indicates that a down-switch can even result in moreserious impacts than an interruption. In other words, aninterruption does not necessarily result in more negativeimpacts than a down-switch.In [3], the ﬁnding is that an interruption has three timesthe impact of a switch. However, the types of interruptionsand switches are not clearly mentioned. It can be seen thatthe weight γ (i.e., 24.16) is approximately three timeof the weight β , − (i.e., 7.89). However, for diﬀerentweight pairs, the ratio considerably changes. This againshows that ﬁndings should be speciﬁc to switching andinterruption types.Similar to switching types, diﬀerent interruption typeshave diﬀerent weights. Therefore, the number of interrup-tions, the average, the maximum, and the sum of interrup-tion durations are also not able to fully reﬂect interruptionsoccurring in a session.Finally, it can be seen that the above ﬁndings are enabledby the use of histograms in the proposed model. In thisway, it is possible to provide a set of insights into therelative impacts of the diﬀerent factors. In other words, theuse of the histograms is more ﬂexible and comprehensivethan the use of some statistics such as the average, themedian, the minimum, and the standard deviation. Based on the above results and discussions, some remarkson the impacts of the factors can be summarized as follows.• First, it is found that a higher quality level has biggercontribution in the QoE of sessions. • Second, the eﬀects of switches depend on not onlyswitching amplitudes but also starting quality values.So switches having larger amplitudes do not neces-sarily cause more negative impacts.• Third, it is suggested that switching to a very lowquality value (i.e., around 1 ∼ In this paper, we have ﬁrst proposed a QoE model tak-ing into account the impacts of varying perceptual qualityand interruptions. Then, by using the two databases, theexperiment results have showed that the proposed modelhas high prediction performance and outperforms the fourexisting models. Finally, by discussing the model param-eters, a set of the insights into the impacts of the factorshave been provided in detail to each switching and inter-ruption type. We hope that the ﬁndings in this paper willbe useful for researchers in better understanding the factorsaﬀecting the QoE of streaming sessions, and further pro-viding some suggestions to improve adaptation strategiesin HTTP Adaptive Streaming.For future work, we will extend the propose model totake into account the impact of the initial delay. Besides,we will seek to apply the proposed model in evaluationsand developments of adaptation strategies, such that theycan utilize network resources to provide the best qualityof experience.9 eferences [1] T. C. Thang, Q. D. Ho, J. W. Kang, and A. T. Pham,“Adaptive Streaming of Audiovisual Content usingMPEG DASH,”

IEEE Transactions on ConsumerElectronics, , vol. 58, no. 1, pp. 78–85, Feb. 2012.[2] J. Xue, D.-Q. Zhang, H. Yu, and C. W. Chen, “As-sessing quality of experience for adaptive HTTPvideo streaming,” in , Chengdu, China, Jul. 2014, pp. 1–6.[3] H. Nam, K.-H. Kim, and H. Schulzrinne, “QoE mat-ters more than QoS: Why people stop watching catvideos,” in , San Francisco,CA, USA, Apr. 2016, pp. 1–9.[4] D. Z. Rodríguez, Z. Wang, R. L. Rosa, and G. Bres-san, “The impact of video-quality-level switchingon user quality of experience in dynamic adaptivestreaming over HTTP,”

EURASIP Journal on Wire-less Communications and Networking , vol. 2014,no. 1, p. 216, 2014.[5] T. Hoßfeld, S. Egger, R. Schatz, M. Fiedler, K. Ma-such, and C. Lorentzen, “Initial delay vs. interrup-tions: Between the devil and the deep blue sea,” in , Yarra Valley, Australia, Jul.2012, pp. 1–6.[6] J. D. Vriendt, D. D. Vleeschauwer, and D. Robinson,“Model for estimating QoE of video delivered usingHTTP adaptive streaming,” in , Ghent, Belgium, May 2013, pp.1288–1293.[7] Y. Shen, Y. Liu, H. Yang, and D. Yang, “Quality ofExperience Study on Dynamic Adaptive Streamingbased on HTTP,”

IEICE Transactions on Communi-cations , vol. 98, no. 1, pp. 62–70, 2015.[8] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli,“A control-theoretic approach for dynamic adaptive video streaming over HTTP,”

ACM SIGCOMM Com-puter Communications Review , vol. 45, no. 4, pp.325–338, 2015.[9] A. Bentaleb, A. C. Begen, and R. Zimmermann,“SDNDASH: Improving QoE of HTTP adaptivestreaming using software deﬁned networking,” in

Proceedings 2016 ACM Multimedia Conference ,Amsterdam, The Netherlands, Oct. 2016, pp. 1296–1305.[10] Z. Guo, Y. Wang, and X. Zhu, “Assessing the visualeﬀect of non-periodic temporal variation of quantiza-tion stepsize in compressed video,” in ,Quebec City, Canada, Sept. 2015, pp. 3121–3125.[11] K. D. Singh, Y. Hadjadj-Aoul, and G. Rubino, “Qual-ity of experience estimation for adaptive HTTP/TCPvideo streaming using H. 264/AVC,” in , Las Vegas, USA, Jan. 2012, pp. 127–131.[12] Y. Liu, S. Dey, F. Ulupinar, M. Luby, and Y. Mao,“Deriving and validating user experience modelfor DASH video streaming,”

IEEE Transactions onBroadcasting , vol. 61, no. 4, pp. 651–665, 2015.[13] D. Z. Rodríguez, R. L. Rosa, E. C. Alfaia, J. I.Abrahão, and G. Bressan, “Video quality metricfor streaming service using DASH standard,”

IEEETransactions on Broadcasting , vol. 62, no. 3, pp.628–639, 2016.[14] T. Hoßfeld, R. Schatz, E. Biersack, and L. Plisson-neau, “Internet Video Delivery in YouTube: FromTraﬃc Measurements to Quality of Experience,”

Data Traﬃc Monitoring and Analysis , vol. 7754, pp.264–301, 2013.[15] M. Seufert, S. Egger, M. Slanina, T. Zinner,T. Hoßfeld, and P. Tran-Gia, “A survey on qualityof experience of HTTP adaptive streaming,”

IEEECommunications Surveys & Tutorials , vol. 17, no. 1,pp. 469–492, 2015.[16] N. Barman and M. G. Martini, “Qoe modeling forhttp adaptive video streamingâĂŞa survey and open10hallenges,”

IEEE Access , vol. 7, pp. 30 831–30 859,2019.[17] H. T. T. Tran, N. P. Ngoc, A. T. Pham, and T. C.Thang, “A Multi-Factor QoE Model for AdaptiveStreaming over Mobile Networks,” in , WashingtonDC, USA, Dec. 2016, pp. 1–6.[18] H. T. Tran, T. Vu, N. P. Ngoc, and T. C. Thang,“A novel quality model for HTTP Adaptive Stream-ing,” in , Ha Long,Vietnam, Jul. 2016, pp. 423–428.[19] S. Egger, B. Gardlo, M. Seufert, and R. Schatz, “Theimpact of adaptation strategies on perceived qualityof HTTP adaptive streaming,” in

Proceedings of the2014 Workshop on Design, Quality and Deploymentof Adaptive Video Streaming , Sydney, Australia, Dec.2014, pp. 31–36.[20] T. Hoßfeld, M. Seufert, C. Sieber, and T. Zinner,“Assessing eﬀect sizes of inﬂuence factors towards aQoE model for HTTP adaptive streaming,” in , Singapore, Sept. 2014,pp. 111–116.[21] S. Tavakoli, S. Egger, M. Seufert, R. Schatz,K. Brunnström, and N. García, “Perceptual qual-ity of HTTP adaptive streaming strategies: Cross-experimental analysis of multi-laboratory and crowd-sourced subjective studies,”

IEEE Journal on Se-lected Areas in Communications , vol. 34, no. 8, pp.2141–2153, 2016.[22] P. Ni, R. Eg, A. Eichhorn, C. Griwodz, andP. Halvorsen, “Flicker eﬀects in adaptive videostreaming to handheld devices,” in

Proceedings ofthe 19th ACM international conference on Multi-media , Scottsdale, Arizona, USA, Nov. 2011, pp.463–472.[23] H. T. Tran, N. P. Ngoc, Y. J. Jung, A. T. Pham, andT. C. Thang, “A Histogram-Based Quality Model forHTTP Adaptive Streaming,”

IEICE Transactions onFundamentals of Electronics, Communications and Computer Sciences , vol. 100, no. 2, pp. 555–564,2017.[24] Z. Duanmu, K. Zeng, K. Ma, A. Rehman, andZ. Wang, “A Quality-of-Experience Index forStreaming Video,”

IEEE Journal of Selected Top-ics in Signal Processing , vol. 11, no. 1, pp. 154–166,Feb 2017.[25] X. Liu, F. Dobrian, H. Milner, J. Jiang, V. Sekar,I. Stoica, and H. Zhang, “A Case for a Coordi-nated Internet Video Control Plane,” in

Proceedingsof the ACM SIGCOMM 2012 Conference on Appli-cations, Technologies, Architectures, and Protocolsfor Computer Communication , ser. SIGCOMM ’12,New York, NY, USA, 2012, pp. 359–370.[26] Recommendation ITU-T P.1203.3, “Parametricbitstream-based quality assessment of progressivedownload and adaptive audiovisual streaming ser-vices over reliable transport-Quality integrationmodule,”

International Telecommunication Union ,2017.[27] W. Robitza, M.-N. Garcia, and A. Raake, “A modularHTTP adaptive streaming QoE model-Candidate forITU-T P. 1203 (“P. NATS”),” in , Erfurt, Germany, Jul. 2017, pp. 1–6.[28] H. T. T. Tran, D. V. Nguyen, D. D. Nguyen,N. P. Ngoc, and T. C. Thang, “An LSTM-basedApproach for Overall Quality Prediction in HTTPAdaptive Streaming,” in

IEEE Conference on Com-puter Communications Workshops (INFOCOM WK-SHPS) , Paris, Apr. 2019.[29] “P.1203 Open Dataset,” https://github.com/itu-p1203/open-dataset, accessed 2018-07-01.[30] W. Robitza, S. Göring, A. Raake, D. Lindegren,G. Heikkilä, J. Gustafsson, P. List, B. Feiten,U. Wüstenhagen, M.-N. Garcia, K. Yamagishi, andS. Broom, “HTTP Adaptive Streaming QoE Estima-tion with ITU-T Rec. P.1203 - Open Databases andSoftware,” in

Proceedings of the 9th ACM Multime-dia Systems Conference , Amsterdam, Netherlands,Jun. 2018, pp. 466–471.1131] A. Raake, M.-N. Garcia, W. Robitza, P. List,S. Göring, and B. Feiten, “A bitstream-based,scalable video-quality model for HTTP adaptivestreaming: ITU-T P.1203.1,” in

Ninth InternationalConference on Quality of Multimedia Experience(QoMEX) , Erfurt, Germany, May 2017, pp. 1–6.[32] “ITU-T Rec. P.1203 Standalone Implementation,”https://github.com/itu-p1203/itu-p1203/, accessed2018-07-01.[33] Recommendation ITU-T P.1401, “Methods, metricsand procedures for statistical evaluation, qualiﬁca-tion and comparison of objective quality predictionmodels ,”

International Telecommunication Union ,2012.[34] T. Tominaga, T. Hayashi, J. Okamoto, and A. Taka-hashi, “Performance comparisons of subjective qual-ity assessment methods for mobile video,” in2010Second international workshop on Quality of mul-timedia experience (QoMEX)