A User-experience Driven SSIM-Aware Adaptation Approach for DASH Video Streaming
AA User-experience Driven SSIM-Aware Adaptation Approach for DASH VideoStreaming
Mustafa Othman
L2TI, Galilee Institute
University Paris 13Villetaneuse, FranceEmail: [email protected]
Ken Chen
L2TI, Galilee Institute
University Paris 13Villetaneuse, FranceEmail: [email protected]
Anissa Mokraoui
L2TI, Galilee Institute
University Paris 13Villetaneuse, FranceEmail: [email protected]
Abstract —Dynamic Adaptive Streaming over HTTP (DASH)is a video streaming technique largely used. One key point isthe adaptation mechanism which resides at the client’s side. Thismechanism impacts greatly on the overall Quality of Experience(QoE) of the video streaming. In this paper, we propose a newadaptation algorithm for DASH, namely SSIM Based Adapta-tion (SBA). This mechanism is user-experience driven: it usesthe Structural Similarity Index Measurement (SSIM) as mainvideo perceptual quality indicator; moreover, the adaptation isbased on a joint consideration of SSIM indicator and the physicalresources (buffer occupancy, bandwidth) in order to minimize thebuffer starvation ( rebuffering ) and video quality instability, aswell as to maximize the overall video quality (through SSIM). Toevaluate the performance of our proposal, we carried out trace-driven emulation with real traffic traces (captured in real mobilenetwork). Comparisons with some representative algorithms(BBA, FESTIVE, OSMF) through major QoE metrics show thatour adaptation algorithm SBA achieves an efficient adaptationminimizing both the rebuffering and instability, whereas thedisplayed video is maintained at a high level of bitrate.
Index Terms — Video Streaming; DASH; ABR; QoE; SSIM;Mobile Networks.
I. I
NTRODUCTION
Recent studies predicted that the growth of mobile trafficwould take up to 20% of total Internet traffic by 2021 [1].Moreover, it is expected that video traffic will reach 80% ofall internet traffic by 2021. The majority of video streamingon the internet today uses the MPEG’s Dynamic AdaptiveStreaming over HTTP (DASH) standard which aims to delivervideo with high Quality of Experience (QoE) [2]–[4]. Theprinciple of DASH consists in dividing the entire video intosegments, called chunk , in order to send them separately byusing HTTP. Each chunk has several versions, each one isencoded with a specific bitrate. Chunks are fetched by clientsthrough HTTP/TCP. This approach makes DASH popular [5][6] since a) it can be built above the omni-present HTTPand b) client can easily choose, for each chunk, the bitratewhich is most suitable to the current network conditions withsome Adaptive Bitrate (ABR) algorithm [7]–[11] in orderto maximize the user’s QoE. Figure 1 depicts the DASHstreaming process. Today, major content providers (includingNETFLIX and YouTube) use DASH.Among the main challenges related to DASH scheme [4][12] [13], there are in particular:
Fig. 1. DASH Streaming Flow Process. The rebuffering : this term refers to the freezing of videoplayback when the buffer is empty and waiting for thenext video chunk. The duration and the frequency ofrebuffering during a video streaming session is amongthe most important metrics that affect the user’s QoE.2)
The instability : it is considered as another importantQoE metric. When the bitrate changes from one chunkto the next one, there is surely some variations on theperceived video quality. If the quality of the video amongconsecutive segments is too abrupt and/or too frequent,the user’s QoE will be bad.It is a real challenge to minimize the rebuffering andinstability, and at the same time to maximize the overall qualityof the received video. Actually, the surest way to get beststability and avoid rebuffering is to send video always at thelowest bitrate level. However, we would have the worst videoquality. So, the selection of the level (in terms of bitrate) ofthe next chunk is a matter of balance between QoE metricssuch as the rebuffering, the instability and the video quality.Video quality can be assessed through several objectivemetrics, among which Structural Similarity Index Measure-ment (SSIM) and Peak Signal to Noise Ratio (PSNR). ThePSNR suffers from its inconsistency with the human eyeperception video quality. The SSIM is considered to be ableto better capture the difference between the original and theencoded images and provides a measurement which is closerto what is visually noticeable as defaults by a human being[11] [14] [15].This paper proposes a new adaptation algorithm for DASH,namely SSIM Based Adaptation (SBA), by using the SSIMindicator to select the quality of the next video chunk withthe objectives of optimizing the QoE by maximizing the video a r X i v : . [ c s . MM ] D ec uality and minimizing the rebuffering and instability. Toachieve such a balance, the main idea resides in the factthat we decide to a) increase the bitrate level only whenthe SSIM indicates a significant improvement in the videoquality (thus getting more video content at almost the same user perceived video quality), and b) decrease the bitrate levelonly when there is a real risk of rebuffering (thus minimizethe instability).Our proposal consists in adding the SSIM values for eachlevel of video chunks in Media Presentation Description(MPD), which is the standard way in DASH to provideclients useful informations for video adaptation. Our maincontribution is to explore this additional information (SSIMvalue), in combination with the classical ones, in order toachieve a better adaptation.Figure 2 helps to better illustrate our basic idea. In thisfigure, each point gives the bitrate of one of the levels. It can beobserved that for chunck number 27, all the levels offer nearlythe same SSIM value; whereas for chunck number 140, thelifting in SSIM value for higher levels (hier bitrates) are rathernoticeable. This leads to the idea of including the SSIM valueamong criteria for level selection. Indeed, it is not efficient toselect a higher bitrate level when the lower level offers a verycomparable video quality. Fig. 2. Values of SSIM for different resolution for the chunck number 27and 140.
The rest of this paper is organized as follows. Section IIprovides the state of the art on the bitrate adaptation algorithmsproblem and video quality metrics. Section III presents ouralgorithm. Section IV compares the results of our algorithmwith a selection of relevant algorithms in the literature. SectionV concludes the paper.II. R
ELATED W ORK
Bitrate adaptation algorithms are generally classified intothree categories:1)
Buffer-based : the buffer occupancy is used as the mainindicator for the selection of the bitrate (level) of thenext video chunk to be downloaded [7] [8].2)
Rate-based : the bitrate (level) of the next video chunkto be downloaded is chosen to maximize the use of the(estimated) future bandwidth [9] [10].3) Mixture: a mix of the two previous categories [11]. Huang et al. [7] proposed the Buffer Based Adaptation(BBA) method, which aims to: a) Avoid unnecessary rebuffer-ing and b) Maximize the overall video quality. They usedthe buffer occupancy as a control signal to select the level(bitrate) of the next video chunk instead of the estimated band-width. They calculate dynamically a couple of
UtilityFunction as the quality metric. The rational can be summa-rized as follows. First, the relationship between bitrate andperceptual quality is not linear; as the bitrate increases, thegain in video quality is gradually saturated. Second, the equaldivision of network bandwidth for video streams of differentresolutions (i.e., a vertical line representing a certain bitrate)results in unfair video quality levels as perceived by end-users.In [15], the authors used the SSIM to measure the quality ofideo transmission through their system (Compressive Dis-tortion Minimizing Rate Control, C-DMRC). The latter uses adistributed cross-layer control algorithm that aims to maximizethe received video quality over a multi-hop wireless networkwith lossy links.III. P
ROPOSED A DAPTATION A LGORITHM
This section presents our new adaptation algorithm, referredsubsequently as SSIM Based Adaptation (SBA).
A. Rationale
Our algorithm combines the networking level control(buffer-based and rate-based) and the SSIM video qualitymetric. The key point consists in using the SSIM indicatorto determine the level (bitrate and so video quality) of thenext chunk to be fetched. A bitrate upgrade takes place notonly because it is allowed by the network, but also becauseit would provide a real gain of user perceived quality. Thealgorithm tries to achieve the following goals : • By estimating the available bandwidth, we always try toget the best achievable quality. • We upgrade to a higher bitrate level only when thereis a real gain in quality. Thus, we try to maximize thevideo content downloading (and so minimize rebuffering)at (almost) the same video quality. • We use buffer-occupancy as rebuffering alert signal anddecrease the bitrate, similar to the TCP’s behaviour, onlywhen there is a real risk. In this way, we minimize theinstability.The outline of the proposed algorithm is summarized asfollows. • At the beginning of the streaming, as the networkingsituation is not known yet, the algorithm starts at thelowest bitrate level. • As the streaming goes on, the algorithm gets a betterestimation of the available bandwidth (noted subsequentlyas
EBW ). • The algorithm upgrades to a higher bitrate level not onlywhen it is allowed by
EBW , but also because this levelprovides a real gain on the video quality according to theSSIM indicator. • In the case where the current
EBW is lower thanthe currently chosen video bitrate level, the algorithmchooses to not decrease the video to a lower bitrate. Inthis way, the algorithm aims to maximize video qualityand to minimize instability due to the bitrate level change.Of course, if the situation persists, there is a risk of bufferstarvation. • The algorithm gives priority to avoiding rebuffering: acritical zone is defined in the buffer. Each time the bufferoccupancy falls into this zone, the algorithm decreasesthe bitrate level to the lowest one.
B. Notation and Conditions
The video is divided into K chunks (video segments) whereeach chunk has an equal duration of T seconds. The video is encoded at R bitrate levels, denoted as R = { r j } j =1 ...R i.e.,each chunk has R encoded versions respectively at r to r R bitrate. By convention, the bitrate levels are ordered as follows r < r < · · · < r R .For each chunk and at each bitrate level, the correspondingSSIM metric is computed as follows. The SSIM of each imageof the chunk is first computed, then the SSIM of a chunk isdeduced as the mean value of the SSIM of the different imagesof this chunk. The SSIM matrix of the video stream is thencomputed. It is composed of Q ( i, j ) element correspondingto the SSIM of the i -th chunk at the r j bitrate level. Fromthe implementation’s viewpoint, this is compatible with thegeneric DASH framework. Indeed, this SSIM matrix can bepre-computed and stored in the server. It is then sufficient toincorporate it into the MPD (Media Presentation Description)so that the client can get it.At the client-side, the algorithm estimates the availablebandwidth with the following process: • Each time the client sends the fetch order of a chunk, saychunk i , this time is memorized as s i . Upon the completedownload of the current chunk at t i with an actual volumeof V b i bits, the bandwidth actually consumed by thedownload of chunk i is computed as: CBW ( i ) = V b i t i − s i . (1) • The estimated bandwidth for the fetch of the l -th chunk( l > ), denoted by EBW ( l ) , is then computed as themean average of the actually consumed bandwidth overthe downloaded chunks: EBW ( l ) = (cid:80) l − i =1 CBW ( i ) l − . (2)For the first chunk, according to the proposed algorithm (as itwill be explained later), the lowest bitrate version ( r ) will beused, i.e. EBW = r .The algorithm keeps also the trace of the difference in SSIMbetween the adjacent chunks actually displayed. Let d i be thelevel at which the chunk i is displayed. For chunk l ( l > ), ∆( l ) = Q ( l, d l ) − Q ( l − , d l − ) , is defined to measure thevariation in terms of SSIM related to the previous chunk ( l − when the l -th chunk is displayed. The mean SSIM variationtill chunk l (for l > ), denoted by α ( l ) , is then computed asbelow: α ( l ) = (cid:80) lk =2 ∆( k ) l − . (3)Following the convention adopted by the scientific commu-nity, the buffer’s capacity is given in seconds. The buffer hasa capacity of BS (in seconds). It is divided into two regions,region C (for Critical) and region N (for Normal). When thebuffer occupancy is below a threshold value (noted by L c ),we are in region C. The current buffer occupancy is denotedby b . . Our Algorithm Hereafter, we describe the SBA algorithm (cf. pseudo-codein Figure 3) which aims to determine the bitrate level (denotedby f ) of the next chunk to be fetched. Input: R , Q , b , L c , l , α ( l ) , EBW ( l ) , d l − , Output: f : Bitrate level at which the next video chunk willbe fetched if b <= L c then f = r else p l = max { r j ∈ R , r j < EBW ( l ) } . if δ ( l ) = Q ( l, p l ) − Q ( l − , d l − ) > α ( l ) then f = p l else f = d l − end if end if return f ; Fig. 3. Adaptation algorithm.
This algorithm is run each time a fetch order can be issued,i.e. either when a chunk is totally displayed or a chunk istotally downloaded, and of course, when there is room in thebuffer (i.e., BS − b > T ). The parameters α ( i ) and EBW ( i ) are assumed to be estimated through parallel processes.Consider the fetch of the l -th chunk and denote the bitratelevel of the previous chunk as d l − and its SSIM Q ( l − , d l − ).This algorithm has two regimes depending on the bufferoccupancy:1) When the current buffer occupancy is in the region C,the next chunk will always be fetched at the lowest level,i.e. f = r .2) When the buffer occupancy is in the region N:a) With EBW ( l ) , the potential bitrate p l which is thehighest bitrate under EBW ( l ) is determined.b) The difference in SSIM if p l should be used, δ ( l ) = Q ( l, p l ) − Q ( l − , d l − ) , is then computed.i) If δ ( l ) > α ( l ) , the algorithm considers thatthere is sufficient gain in video quality andchoose p l as the bitrate for chunk l , i.e. f = p l .ii) Otherwise, the next video chunk at the current level (i.e,. f = d l − ) is fetched.IV. P ERFORMANCE E VALUATION
This section compares and discusses the performance ofthe proposed SBA algorithm to a selection of competitivealgorithms.
A. Evaluation Framework
To get a realistic networking context, a set of real trafficsituation over the 4G mobile network of a major networkprovider has been collected. The traces were collected fromdifferent areas and periods in Paris to insure a large coverage of the traffic patterns. Three traditional test videos have beenused:1) Animation (Big Buck Bunny) [19],2) Documentary film (Of Forests and Men) [20],3) Sport (The World’s Best Bouldering in Rocklands, SouthAfrica) [21].These videos are encoded with FFMPEG codec at the fol-lowing levels (the ones used by Netflix) [22] [23]: R = { } . Videos are thendivided into chunks ( T = 4 seconds, [16]) by using MP4Box-GPAC framework [24].We developed (in Python) a simulator in order to evaluatethe performance of DASH-based adaptation algorithms. Thissimulator can work in trace-driven mode, i.e., the networkingcontext is reconstituted with real networking traces. Thesimulator reproduces timely the instants of chunk downloadcompletion (which depends on network condition) as well asthe chunk playback (which can be blocked by rebuffering). Ateach instant where the next chunk is to be downloaded, ouralgorithm enters in action by computing the level of the nextchunk.By using this simulator and the real-traffic trace previouslymentioned, we compared SBA algorithm with the followingthree ones: BBA [7], FESTIVE [9] and OSMF [16]. We havetested two scenarios with two different buffer sizes: a) BS =120 seconds, b) BS = 240 seconds. Each scenario is testedwith different traces. The threshold value ( Lc ) is set to seconds ( chunks) in both scenarios.The performance of the algorithms is assessed through 4metrics (i.e., Rebuffering, Instability, SSIM, birate). For eachmetric, the average value is computed on tests:1) Average Rebuffering : is the average of rebuffering(freezing) duration.2)
Average Instability : is the average of bitrate changes.3)
Average of SSIM : is the average of the SSIM of thevideo being displayed.4)
Average of bitrate : is the average bitrate of the videobeing displayed.
B. Performance Analysis
This section provides discussions on the achieved perfor-mance using the
Animation video stream. Table I summarizesthe results for the two scenarios. For the first 4 lines in thetable BS = 120 s, whereas for the last 4 lines BS = 240 s.One can observe that the SBA algorithm achieves the desiredobjective with shorter rebuffering, less instability at a goodbitrate level.Moreover, Figure 4 shows that our proposal SBA introduceszero rebuffering for both the scenarios. Actually, we givepriority to rebuffering avoidance by setting a critical zonewith drastic bitrate drop-off. Being a buffer-based algorithm,BBA works in a similar way and so shows also the same zerorebuffering. On the contrary, FESTIVE and OSMF undergorebuffering during video chunks playback for 21.208 and 46.25 ig. 4. Average Rebuffering duration for different algorithms with buffersizes of and seconds and with animation (big buck bunny). seconds respectively. So, our algorithm performs better thanFESTIVE and OSMF for the given scenarios. Fig. 5. Average Instability for different algorithms with buffer sizes of and seconds and with animation (big buck bunny).
Figure 5 shows that our proposal SBA achieves goodperformance, since it is respectively at the first (for BS = 120 sec.) and second (for BS = 240 sec.) places. For the scenariowith BS = 240 seconds, BBA algorithm is slightly better thanour SBA: this is due to a more conservative bitrate increaseapproach of BBA. But the price to pay is a much lower averagebitrate of BBA compared to the others, where as our algorithmkeeps the highest average bitrate (cf. Figure 7). Fig. 6. Average SSIM for different algorithms with buffer sizes of and seconds and with animation (big buck bunny).
As for the SSIM (see Figure 6), our proposal SBA and BBAhave similar performance, which is much better than the twoothers. This means in particular that our choice of upgradingonly if there is a real gain in SSIM is justified.
Fig. 7. Average Bitrate for different algorithms with buffer sizes of and seconds and with animation (big buck bunny).TABLE I. SUMMARIZED RESULTS OF THE TWO SCENARIOS WITH
ANIMATIONAdaptation Algo. Rebuffering Instability SSIM BitRate
SBA
BBA [7]
BBA [7]
DOCUMENTARYAdaptation Algo. Rebuffering Instability SSIM BitRate
SBA
OSMF [16] 44.625 76.958 0.459 2848.442SBA
OSMF [16] 44.625 76.958 0.459 2841.568
Figure 7 gives the average bitrate for different algorithms.As it is shown, our proposal SBA achieves the highest averagebitrate for both scenarios. We notices also that BBA, whichhave similar performance as our algorithm for the first 3metrics, gets here the lowest bitrate, probably because there isan excessive consideration for rebuffering avoidance.
C. Results Summary
Additional results using the two test videos, namely
Docu-mentary (see Table II) and
Sport (see Table III) are given inthis section. Similar results to those obtained with
Animation stream are observed. One can notice that for both scenarios,our proposal SBA algorithm, achieves better ranking for mostof the metrics. V. C
ONCLUSION
This paper proposed a new adaptation algorithm SSIMBased Adaptation (SBA) for DASH video streaming. This al-
ABLE III. SUMMARIZED RESULTS OF THE TWO SCENARIOSWITH
SPORTAdaptation Algo. Rebuffering Instability SSIM BitRate
SBA
BBA [7]
BBA [7] gorithm is user-experience driven since the main control factoris the Structural Similarity Index Measurement (SSIM) whichis a good objective indicator for user perceived video quality.This algorithm takes jointly into consideration the networkinglevel indicators (i.e., buffer occupancy, bandwidth) and theSSIM to select the next level of video chunk. The performanceanalysis of the provided results carried out on trace-drivenemulation with real traffic traces (captured in real mobilenetwork) show that the proposed algorithm, compared to somerepresentative algorithms (BBA, FESTIVE, OSMF) throughmajor QoE metrics show that our algorithm (SBA) achieves amore efficient adaptation by minimizing both the rebufferingand instability, whereas the displayed video is maintained ata high level of bitrate. Our main working direction being thejoint consideration of networking mechanism and objectivevideo quality metric, we continue in this direction for ourfuture work. For SBA, we plan to explore the impact of thechoice of the threshold value of SSIM. We also plan to extendthis work to other relevant video quality metrics.A
CKNOWLEDGMENT
This work is supported in part by a scholarship of CampusFrance (878164H) as well as by a complementary support ofthe Galileo Graduate School-University Paris 13 (ED146).R
EFERENCES[1] C. V. networking Index, “Forecast and methodology, 2016-2021, whitepaper,”
San Jose, CA, USA , vol. 1, 2016.[2] H. Nam, K.-H. Kim, and H. Schulzrinne, “Qoe matters more than qos:Why people stop watching cat videos,” in
INFOCOM 2016-The 35thAnnual IEEE International Conference on Computer Communications,IEEE . IEEE, 2016, pp. 1–9.[3] M. Seufert et al., “A survey on quality of experience of http adaptivestreaming,”
IEEE Communications Surveys & Tutorials , vol. 17, no. 1,pp. 469–492, 2015.[4] F. Dobrian et al., “Understanding the impact of video quality on userengagement,” in
ACM SIGCOMM Computer Communication Review ,vol. 41, no. 4. ACM, 2011, pp. 362–373.[5] F. WANG, Z. FEI, J. WANG, Y. LIU, and Z. WU, “Has qoe predictionbased on dynamic video features with data mining in lte network,”
Information Sciences , vol. 60, no. 042404, pp. 1–042 404, 2017.[6] F. Hartung, S. Kesici, and D. Catrein, “Drm protected dynamic adaptivehttp streaming,” in
Proceedings of the second annual ACM conferenceon Multimedia systems . ACM, 2011, pp. 277–282.[7] T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson, “Abuffer-based approach to rate adaptation: Evidence from a large videostreaming service,”
ACM SIGCOMM Computer Communication Review ,vol. 44, no. 4, pp. 187–198, 2015. [8] C. Zhou, C.-W. Lin, X. Zhang, and Z. Guo, “Buffer-based smooth rateadaptation for dynamic http streaming,” in .IEEE, 2013, pp. 1–9.[9] J. Jiang, V. Sekar, and H. Zhang, “Improving fairness, efficiency, andstability in http-based adaptive video streaming with festive,”
IEEE/ACMTransactions on Networking (TON) , vol. 22, no. 1, pp. 326–340, 2014.[10] S. Akhshabi, L. Anantakrishnan, A. C. Begen, and C. Dovrolis, “Whathappens when http adaptive streaming players compete for bandwidth?”in
Proceedings of the 22nd international workshop on Network andOperating System Support for Digital Audio and Video . ACM, 2012,pp. 9–14.[11] S. Cicalo, N. Changuel, R. Miller, B. Sayadi, and V. Tralli, “Quality-fairhttp adaptive streaming over lte network,” in .IEEE, 2014, pp. 714–718.[12] S. S. Krishnan and R. K. Sitaraman, “Video stream quality impactsviewer behavior: inferring causality using quasi-experimental designs,”
IEEE/ACM Transactions on Networking , vol. 21, no. 6, pp. 2001–2014,2013.[13] R. K. Mok, E. W. Chan, X. Luo, and R. K. Chang, “Inferring the qoeof http video streaming from user-viewing activities,” in
Proceedingsof the first ACM SIGCOMM workshop on Measurements up the stack .ACM, 2011, pp. 31–36.[14] P. Georgopoulos, Y. Elkhatib, M. Broadbent, M. Mu, and N. Race,“Towards network-wide qoe fairness using openflow-assisted adaptivevideo streaming,” in
Proceedings of the 2013 ACM SIGCOMM workshopon Future human-centric multimedia networking . ACM, 2013, pp. 15–20.[15] S. Pudlewski, A. Prasanna, and T. Melodia, “Compressed-sensing-enabled video streaming for wireless multimedia sensor networks,”
IEEETransactions on Mobile Computing , no. 6, pp. 1060–1072, 2012.[16] M. Riad, H. Abu-Zeid, H. S. Hassanein, M. Tayel, and A. A. Taha,“A channel variation-aware algorithm for enhanced video streamingquality,” in
Local Computer Networks Conference Workshops (LCNWorkshops), 2015 IEEE 40th . IEEE, 2015, pp. 893–898.[17] Z. Wang, L. Lu, and A. C. Bovik, “Video quality assessment based onstructural distortion measurement,”
Signal processing: Image communi-cation , vol. 19, no. 2, pp. 121–132, 2004.[18] T. Zinner, O. Hohlfeld, O. Abboud, and T. Hoßfeld, “Impact of framerate and resolution on objective qoe metrics,” in2010 second interna-tional workshop on quality of multimedia experience (QoMEX)