Classifying flows and buffer state for YouTube's HTTP adaptive streaming service in mobile networks
Dimitrios Tsilimantos, Theodoros Karagkioules, Stefan Valentin
CClassifying flows and buffer state forYouTube’s HTTP adaptive streaming service inmobile networks
Dimitrios Tsilimantos, Theodoros Karagkioules, and Stefan Valentin
Mathematical and Algorithmic Sciences Lab, Paris Research CenterHuawei Technologies France { dimitrios.tsilimantos, theodoros.karagkioules, stefan.valentin } @huawei.com Abstract —Accurate cross-layer information is very useful tooptimize mobile networks for specific applications. However,providing application-layer information to lower protocol layershas become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signalingstandards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP AdaptiveStreaming (HAS) applications at the lower layers. By observingIP packet arrivals, our machine learning system identifies videoflows and detects the state of an HAS client’s play-back buffer inreal time. Our experiments with YouTube’s mobile client showthat Random Forests achieve very high accuracy even with astrong variation of link quality. Since this high performanceis achieved at IP level with a small, generic feature set, ourapproach requires no Deep Packet Inspection (DPI), comes at lowcomplexity, and does not interfere with end-to-end encryption.Traffic profiling is, thus, a powerful new tool for monitoring andmanaging even encrypted HAS traffic in mobile networks.
Index Terms —HTTP Adaptive Streaming; YouTube; MPEG-DASH; Service Classification; Machine Learning.
I. I
NTRODUCTION
Mobile video streaming generates a significant portion oftraffic in cellular networks. According to Cisco’s traffic report[1], video already represented 60% of the mobile IP traffic in2016 and is projected to reach 78% by 2021. This traffic isdominated by HTTP Adaptive Streaming (HAS) services [2],which follow the Dynamic Adaptive Streaming over HTTP(DASH) standard [3] or the HTTP Live Streaming (HLS)specification [4].Reacting to this heavy increase in video traffic, MobileNetwork Operators (MNOs) have started to deploy traffic shap-ing solutions. In November 2015, T-Mobile USA deployed
BingeOn which offers an unlimited plan for video streamingwhile throttling the video bit-rate to “approximately 1.5 Mbit/saveraged over one minute of video” [5]. A similar solutionbecame operational in Germany in April 2017 under the term
StreamOn . Other MNOs investigate similar solutions for trafficshaping, while network equipment vendors are customizingbase station schedulers to support video streaming by specificrate guarantees [6] or weight adjustment [7].All these solutions require a certain degree of application-layer knowledge. Before a video flow can be throttled orscheduled with specific rules, its packets need to be identified.Once identified, it is beneficial to know the video bit-rate and play-back buffer state of that stream in real time. Thisinformation allows schedulers, traffic shaping and admissioncontrol schemes to minimize their impact on Quality ofExperience (QoE) or to even increase it by providing bit-rateguarantees when possible [7].This demand for accurate application-layer information isa major practical problem. Network optimization functionstypically operate at the Layer 2 and 3 of the ISO/OSI protocolstack, while application information is available at Layer 7.Currently, MNOs solve this cross-layer signaling problem bya combination of explicit signaling or Deep Packet Inspection(DPI). DPI aims to dissect Layer 7 traffic at Layer 2 or 3and extracts flags and parameters from protocol headers aboveLayer 3 or even from payload. Explicit signaling, however,requires Over-The-Top (OTT) media services to add specificflags or tags to their streaming data.Since these methods are based on simple rule systemsand string comparison, they are susceptible to spoofing. Amalicious user can mislead a flow classifier by injecting theappropriate flags or tags in its own traffic, which may not nec-essarily be a video stream. By using an HTTP proxy, a simplespoofing method for BingeOn was demonstrated in [8] onlyseveral months after this service became available. In additionto spoofing, the use of end-to-end encryption techniques suchas Transport Layer Security (TLS) and Secure Sockets Layer(SSL) is a major issue for DPI. Since most major streamingOTTs have adopted TLS/SSL, MNOs cannot directly applyDPI to such encrypted video flows. In order to access theapplication-layer information, they have to interrupt the end-to-end encryption, e.g., by terminating the end-point on theclient side. Since the OTT has to accept such “man in themiddle attacks”, this approach weakens operational security,leads to rejected certificates with more rigorous clients, andeventually opens the door for malicious use.This paper tackles the cross-layer signaling problem only byobserving the IP packet flows of HAS traffic. We follow themain idea of traffic profiling by estimating application-layerinformation based on characteristics observed at the lowerlayers. In particular, we observe information such as the IPaddresses of source and destination, IP packet size, and IPpacket arrival time from queues in the network. Based on thisinput, our system separates HAS video flows from non-HAS a r X i v : . [ c s . MM ] M a y raffic and estimates the current state of the video client’s play-back buffer. Based on 120 hours of end-to-end encrypted trafficdata from YouTube, our approach performs this classificationat very high accuracy. This is a consequence of the regulararrival patterns of HAS, our careful feature design and theuse of state-of-the-art machine learning methods.The particular contributions of our paper are:1) A new traffic profiling system that classifies the flowtype and play-back buffer state of HAS at the IP layerin real time.2) A careful feature design that is generic enough to equallywork with TCP and UDP-based streaming and leads toa small feature set for low computational complexity.3) A rigorous experimental design to collect data andground truth from YouTube’s video streaming service.4) A representative performance evaluation with interestinginsights on the effect of the selected features, machinelearning method and link quality.The remainder of this paper is organized as follows. Wediscuss related work in Section II and summarize our systemassumptions in Section III. Then, we present our traffic pro-filing solution in Section IV including the machine learningsystem. We detail the methodology of our YouTube experi-ments in Section V, present experimental results in SectionVI, and conclude the paper in Section VII.II. R ELATED WORK
The main idea of using passive traffic measurements forrecognizing statistical patterns of video traffic has been studiedin several works, as for example in [9]–[15]. Besides [10]that focuses on the analysis and understanding of the obtainedmeasurement data, the rest of the studies rely on MachineLearning (ML) methods, which have become prominent fortraffic classification. We encourage interested readers to referto [9] and the references therein for a detailed comprehensivesurvey.When it comes to video streaming, most of the recentworks propose to distinguish the entire video session intodifferent classes. For example, the authors in [11] proposea system that monitors application-level quality indicatorsand corresponding traffic traces to classify YouTube videosinto three QoE classes. Similarly, different levels of QoEare studied in [12] with the focus on stalling, average videoquality and quality variations as the key influence factors. Acausal analysis between QoE and Quality of Service (QoS) ispresented in [13] with features from application and network-layer QoS metrics, while an approach to discriminate betweenaudio and video HAS flows is proposed in [14]. Compared tothese studies that classify an entire video session into a singlecategory, our classification is performed at a higher temporalresolution by estimating dynamic video traffic parameters inreal time.Closer to our work is the methodology presented in [15],where the target is to predict the class of the play-back bufferlevel during the video session by defining a set of buffer levelsin seconds. Unlike in this work, our focus is the prediction of the buffer state, which is a more fundamental property ofadaptive streaming clients.Our own prior work points to an interesting application forbuffer state classification. In [16], we exploited that everyadaptive streaming client strives to achieve a rate matchbetween the client’s download rate (i.e., throughput) and theserver’s source rate (i.e., content encoding rate). This match isobtained in the steady state where, consequently, throughputis a good predictor for encoding rate. This rate estimationrequires the accurate detection of the steady state, whichwe achieved by simple heuristics in [16]. As this fixed-ruleapproach failed for the more complicated variation of linkquality in mobile networks [17], we now adopt ML modelsto generalize buffer state detection to a wider set of practicalstreaming scenarios and to perform HAS traffic classificationas well. III. S
YSTEM MODEL
A packet flow is defined as a series of packets sharingthe same source and destination IP addresses, source anddestination IP ports and transport protocol. Without loss ingenerality, we assume that packets are generated with a typicalTCP/IP stack, i.e., TCP or UDP is used on top of IP. Packetflows are also distinguished in different classes, each classindicating IP traffic that belongs to a specific application, e.g.Web, file transfer, gaming, Peer-to-Peer (P2P), chat and mediastreaming. During video streaming, the source node is a videoserver and the destination node is a video client softwarerunning on a mobile device. HAS is considered accordingto standards [3], [4] and the communication between serverand client utilizes at least one intermediate node, e.g. edge-router, gateway, Base Station (BS) or access point, close tothe edge that forwards the packets in either direction. This lastassumption is necessary since our traffic profiling solution isbased on the fact that the observed packet Inter-Arrival Times(IATs) are close to end-to-end IATs. Multiplexing in the samepacket flow is allowed in our system, as long as it covers onlydifferent audio and video streams of a single HAS session anduser. This enables us to include Google’s Quick UDP InternetConnections (QUIC) [18] in our studies, a multiplexed streamtransport protocol over UDP that is widely used lately.Briefly describing an HAS system, a play-back buffer isused at the client side in order to compensate for variationsin received throughput, due to the dynamic nature of thewireless channel conditions, but also in video encoding rate, asVariable Bit-Rate (VBR) is commonly adopted for encoding.Moreover, a video is divided into a sequence of smallersegments containing a short interval of content play-backtime, typically in the order of few seconds. Video segmentsare encoded in multiple quality representations, which arestored in the video server. The client is then able to adjustthe play-out quality by sequentially requesting segments inthe representation indicated by the algorithm of the deployedHAS policy, which usually takes into account buffer levelinformation and throughput statistics. While the description ofdifferent HAS policies goes beyond the scope of this paper,
100 200 300 400 500 600
Time (s) B u ff e r O cc up a n c y ( s ) UnlabeledFillingSteadyDepleting0 100 200 300 400 500 600
Time (s) A cc u m u l a t e dd a t a ( M B ) FillingSteadyDepleting
Fig. 1. Example of a labeled video flow [21], using the setup in Section V-Bunder experimental scenario (s4) recent insightful comparative studies can be found in [19],[20].Fig. 1 shows a typical HAS session, as measured usingthe setup of Section V-B. The top figure shows the play-back buffer level in seconds, directly extracted from client’sstreaming application, while the bottom figure displays theaccumulated streaming data over time, as recorded in ournetwork traces. From this example, we can observe the threecharacteristic states of an HAS session. First, while the bufferis not sufficiently full, there is an initial burst of data wherea new segment is requested immediately after the completedownload of the previous one, leading to a streaming ratehigher than the video bit-rate. We denote this period as fillingstate , since the client quickly fills the buffer to a certain level,equal to s in this example. Once this target is reached, the steady state takes place where the streaming rate matches thevideo bit-rate, keeping the buffer level stable. This is achievedby a segment request pattern that leads to short packet burstsof one or more segments, followed by idle data transmissionperiods. Furthermore, due to the dynamic wireless channelconditions and the presence of diverse bottlenecks in the videodelivery system, throughput may drop below the video bit-rate during streaming. We impose this case in the example ofFig. 1 by applying rate throttling in the interval [170 , s.At the beginning of this interval, the client tries to downloaddata with the available throughput, but this is not enough tosupport the current video bit-rate and the buffer level inevitablydecreases. We define this period as a depleting state . Then, inorder to avoid a forthcoming video stall by letting the bufferrun empty, the HAS policy switches to a lower video quality, inthis example at s, with an average bit-rate below the currentthroughput, leading to a second filling state and a subsequent TABLE IR
ECORDED P ARAMETERS
Layer Name Description
Network p srcIP packet source IP address p srcPort packet source port number p dstIP packet destination IP address p dstPort packet destination port number p size packet payload size (Bytes) p time packet arrival time (s) p protocol packet transport protocolApplication bh buffer level (s) videoid video ID – used for sanity check fmt video quality (itag) afmt audio quality (itag) timestamp time of buffer level entry (s) steady state. After the end of the rate-throttling interval, higherthroughput is again available and a new quality change leadsto a third filling state, since the buffer is quickly filled withsegments of higher quality. A last steady state takes placewhen the buffer target is reached again and then, after theentire video is transmitted, the session ends by playing-out theremaining bits from the buffer. We leave this part as unlabeledin the top figure, since there is no respective streaming data.Finally, we assume that a list of network-layer parameterscan be recorded and logged to a file by observing eachpacket flow. This information can be observed at Layer 3and upper Layer 2, i.e. before Radio Link Control (RLC)frame concatenation. A complete list of the required networkinformation at packet level is provided in Table I and used tocalculate numerical attributes, namely features , over multiplepackets of the same flow. Table I also includes the recordedapplication-layer parameters from YouTube, which we collectin order to establish the ground-truth for training purposes.This allows to create a training data set, represented by matrix T ∈ R N × ( M +1) , as a set of input (features) and output (label)pairs for N sampling periods of duration T s : T = x y x y ... ... x N y N (1)where x i = [ x i , x i , ..., x iM ] is a vector of M featurescalculated at i -th sampling period, y i is the corresponding labelvalue and each class is assigned a unique numerical value.This data set is then used to train a set of ML classifiers.Classification belongs to the category of supervised machinelearning that involves a set of pre-classified or labeled data,associated with a set of features corresponding to this data.This input is then used to train a model by creating a set ofrules in order to classify new instances based only on theirfeatures. The training phase usually requires a large trainingdata set for better performance and may be time consuming,but can be performed off-line. Real-time classification uses the ideo contentInternet Traffic profilingRequest for next video segment via BS uplink HAS policy UE BS UE Buffer level Video flow UE Per-user queues with IP packets ThroughputPlay-back bufferUE Classificationresults(i) HAS flow (ii) buffer stateHASnon-HASPacket flowsHAS server
Fig. 2. Example of traffic profiling at the BS with 2 users; UE with HAStraffic and UE with non-HAS traffic already trained classifier in order to predict the label y of thepreviously unseen feature vector x and in general can be verytime efficient. IV. T RAFFIC PROFILING
The proposed traffic profiling system adds a module thatmonitors packet flows at the edge of a mobile network, i.e.at any edge-router, gateway, BS or access point. Fig. 2 showsan application example with traffic profiling deployed at theBS monitoring the traffic of 2 User Equipments (UEs), wherearriving packet flows are placed in user-specific queues andserved by the BS scheduler.At the core of traffic profiling, information directly observ-able at packet level is used to construct a set of features,which are later used for the ML models. Our first classificationproblem is to accurately distinguish HAS flows from a set ofpacket flows with arbitrary traffic. Then, once an HAS packetflow is identified, we apply a second classification to detectthe various buffer streaming states in real time during the HASsession. In summary, the main functionality of the proposedtraffic profiling system is as follows:1)
Collection of packet information:
At the transport layer,observable information at packet level is collected foreach monitored packet flow, even for encrypted traffic.2)
Construction of features:
For each flow, features arecalculated based on the collected packet informationand used to build ML models that recognize statisticalproperties of the flow.3)
Detection of HAS flows:
A HAS flow is distinguishedfrom other packet flows in real time by plugging theconstructed features into the trained classifiers.
TABLE IIS
ET OF DEFINED CLASSES
Problem Name Description
Service type
HAS
HAS traffic non-HAS non-HAS traffic (Web, downloads)Buffer state filling streaming rate is higher than video bit-rate, buffer is filling steady streaming rate matches video bit-rate,buffer target level is reached depleting streaming rate is lower than video bit-rate, buffer level decreases unclear all other cases, e.g. streaming rate isclose to video bit-rate but buffer targetlevel is not reached Buffer state estimation for HAS flows:
Different stream-ing states for each HAS flow are identified in real timeby using similar features and classification models.
A. Classification
A summary of all the defined classes is shown in Table II.Video flow detection is formulated as a binary classificationproblem in our model. Each individual packet flow is classifiedeither as ‘HAS’ or ‘non-HAS’ traffic, which in our caserepresents any measured non-HAS traffic, i.e. file downloadsand Web browsing. Since we are mainly interested in videostreaming, a study that attempts to distinguish a multitude ofdifferent applications with one class per application is out ofthe scope of this paper.The problem of buffer state classification is more demand-ing, since there are more than two classes involved, potentiallyvarying at any new sampling interval inside the same HASflow. In this case we define a set of four different classes,i.e. ‘filling’, ‘steady’, ‘depleting’ and ‘unclear’, where the firstthree have been explained in Section III and the last one issimply introduced to cover few remaining cases and mainlycomprises instances where the buffer is not close to the targetlevel, but the video bit-rate is almost equal to the availablethroughput, leading to a slowly varying buffer.Based on our training data set, we evaluate the followingfive different ML classifiers, all of them well known in theML literature:1)
Support Vector Machines (SVM): finds the best hyper-plane separating data points of different classes.2) k -Nearest Neighbors (KNN): each sample is assigned tothe most common class among its k nearest neighbors.3) Boosted trees : uses AdaBoost [22] algorithm to empha-size previously mis-modeled training instances.4)
Random Forest (RF) [23]: builds many decision treesand assigns instances to the class that most trees agreeon.5)
RUSBoost trees : a hybrid sampling/boosting ensemblemethod with RUSBoost algorithm [24] for skewed train-ing data.For their implementation we use the ‘Statistics and MachineLearning’ toolbox from MathWorks [25]. Specifically forVM, a one-versus-one coding design is selected for the bufferstate classification, since more than two classes are involved.Moreover, we use standardization as a rescaling method offeatures for classifiers that calculate the distance between twopoints, i.e. SVM and KNN, since our features have a widerange of values. For the evaluation of all five classifiers,we perform k -fold cross-validation, a common approach totrain and validate a data set. This means that the trainingdata set is randomly split into k equal sized partitions. Then,each partition in turn is used as the validation data and theremaining k − are used for training. This process is repeated k times with each of the k subsets used exactly once as thevalidation data set. B. Feature construction
We design a small feature set in order to capture theessential information of HAS traffic at low complexity. Alarge number of features can often have a negative impacton the performance of ML algorithms [9] and also intro-duces higher computational and memory requirements, whichare undesirable for practical implementations. The completefeature set is presented in Table III. For the purpose of ourcalculations we adopt a time sliding window approach, i.e.we continually measure these features every sampling period T s at t w = T s , T s , . . . , N T s over a time window of duration T w = nT s with n ≥ . Both for video flow and buffer stateclassification, we calculate the features of Table III over L different time windows in parallel, leading to a total numberof M = 5 L features. This helps us to capture both short-term and long-term fluctuations at the cost of an increasedfeature space. We will revisit the impact of different windowson the classification performance when we present our resultsin Section VI.The first feature in Table III, DLrate , is simply the downlinkrate of the packet flow in bit/s, given by
DLrate = 8 · (cid:80) p ∈P DL p size T w , (2)where p is the index of a packet and P DL is the set of packetswith (i) p time ∈ [ t w − T w , t w ) and (ii) p dstIP equal to theIP address of the client. This feature is particularly usefulfor buffer state classification, as it can reflect the differencebetween a filling and a steady state for similar throughput, orindicate a depleting state. DLrate is complemented by our second feature
DLload = (cid:80) p ∈P DL (∆ p time · ∆ p time ≤ h t ) T w , (3)where ∆ p time is the IAT of two successive packets and ( . ) isthe indicator function, equal to one when the IAT is less than aspecified threshold h t < T s . DLload measures the percentageof the time that is used for downlink transmission. Since itis normalized by the window duration T w , DLload ∈ [0 , .The numerator of (3) models the duration of continuoustransmission and allows DLload to distinguish long from shortdata bursts, a characteristic of different buffer states as we seein Fig. 1.
TABLE IIIS
ET OF FEATURES CALCULATED OVER EACH TIME WINDOW
Name Description Unit
DLrate downlink transmit rate bit/s
DLload fraction of used transmission time 1
ULnPckts number of uplink packets 1
ULavgSize average uplink packet size Bytes
ULstdSize standard deviation of uplink packet size Bytes
The third selected feature, denoted as
ULnPckts , is the num-ber of uplink packets. In HAS traffic, uplink packets mostlyinclude HTTP requests for content segments and regularACKs. An easy way to remove ACKs from our classificationis to include only packets with p size > h s , where h s is athreshold for packet size, since segment requests are typicallymuch larger. Consequently, we define ULnPckts = (cid:88) p ∈P UL p size >h s , (4)where P UL is the set of packets with (i) p time ∈ [ t w − T w , t w ) ,(ii) p srcIP equal to the IP address of the client and (iii) 3-tuple ( p dstIP , p dstPort , p protocol ) identical to ( p srcIP , p srcPort , p protocol ) ofthe respective downlink flow. The purpose of including UL-nPckts in the feature set is twofold. First, the packet arrivalpattern for HAS traffic is different from other applicationslike Web browsing (less periodic) and file downloads (usuallyonly few requests at the beginning). Secondly, the number ofrequests can indicate different buffer classes even for similar
DLrate and
DLload . Such a case is shown in Fig. 1, wherethe second filling state is separated by the previous depletingstate due to the increased number of uplink packets, sincemore segments are downloaded after the quality switch.As fourth and fifth feature,
ULavgSize and
ULstdSize , weselect the arithmetic mean and standard deviation of p size overthe previously identified ULnPckts . Both features capture themain statistics of uplink packet size and are basically selectedto improve packet flow classification. For a single video in thesame streaming session, we expect a similar size for uplinkpackets of consecutive segment-requests. Moreover, we expectthis size to vary only slightly over different videos of thesame service. Such characteristic packet size distribution isnot necessarily true for Web traffic, as shown in [26].Finally, it is worth highlighting that we intentionally excludeTCP-specific features, such as TCP flag, sequence numberand window. Although this type of data may be useful, asshown in [11], our goal is to provide a minimal set of featuresthat is generic enough to cover both TCP and QUIC/UDPtraffic. In the same spirit, we also exclude video segment-related features, such as segment size and inter-request timethat are used in [12]. Since we are designing an alternativeto DPI for encrypted packet flows, we cannot assume that areliable dissection scheme for detecting segments is in place.Identifying segments and segment requests by traffic profiling,however, would require strong assumptions on the requestpattern of the OTT (e.g., to rule out cumulative requestsor segments). Segment-related features are also problematicin the case of QUIC/UDP traffic, where video and audiosegments are multiplexed in a single packet flow. For QUICtraffic, our experiments showed that an audio segment canbe requested even before the complete download of a videosegment and vice versa. Such asynchronous request patternmakes the accurate detection of segments very challenging.Thus, we believe that our feature set is more robust and moregeneral without segment-related and TCP-specific variables.V. M
ETHODOLOGY
Fig. 3 provides an overview of our methodology for thecollection of data and ground truth. First, a set of controlscripts that run at an intermediate node drives the wholeprocess and allows us to control the phones and configurethe measurement campaign by setting the following list ofparameters: • target video ID • video quality profile • wireless channel profile • number of measurement iterations • set of network-layer data to be recorded • set of application-layer data to be recordedAs a result of the control scripts, a batch of data log filesis created as soon as a video streaming session is completed.Network-layer logs are provided by a packet analyzer tool andinclude all the network parameters of Table I. A. Ground truth and labeling process
Before we proceed with labeling, we need to obtain certainparameters from the YouTube application. For this reason, wedeveloped a wrapper application for YouTube and installed itin the client’s device, allowing control and automatic interac-tion with YouTube’s Android interface. Application wrappingis the process of applying a management layer to a mobileapplication without changing the underlying application. Thisway, video ID selection, quality adjustment and settings con-figuration, such as disabling auto-play or enabling YouTubevideo statistics, can be programmed and executed without anyuser intervention. Moreover, the wrapper application allowsto monitor the progress of the streaming session and retrieveapplication-layer information, as listed in Table I, that isavailable through the stats for nerds option in YouTube’sinterface. More details regarding the available informationthrough the statistics module of YouTube can be found in[27]. These application parameters are recorded in a log-filetwice per second by using a clipboard application , whichcopies debug information from the data buffer (clipboard) ofthe phone directly into a file. All commands required for theimplementation of an experimental scenario are transferred tothe device via the Android Debug Bridge (ADB) tool, as soonas they are parsed from the respective configuration script.The process of labeling is trivial for the binary classificationproblem of HAS flow detection, as we simply need to assignthe same label for all sampling periods of a packet flow. Byisolating experiments with HAS traffic, it is easy to detect HAS
Network layer tracesYouTube wrapper application GUI labeling tool Feature construction
Data logs Processing of collected data
Class predictionML trained classifiersControl scripts
Experimental configuration
Required for offline training
Fig. 3. Flow chart of the developed experimental setup packet flows, by checking the total packet size of each flow,as only streaming flows should have significant size. However,in our approach we also visually verify labels by comparingthe data pattern of a flow to the pattern of the buffer levelobtained by the application layer.On the other hand, labeling for buffer state classificationis challenging, since there is no ground truth informationabout buffer states available from the application layer and onehas to rely on the buffer level logs. An automated procedurebased on algorithms that analyze these logs is possible, butrequires careful design covering many different patterns andoutliers that could otherwise lead to false results. Instead, inour methodology we decided to introduce a process based onmanual inspection. For this purpose, we developed a GraphicalUser Interface (GUI) tool that loads parsed network andapplication logs, provides both buffer level and accumulateddata plots as in Fig. 1, and additionally allows a user tomanually: • select for which packet flow (from the list of capturedflows) to plot the respective accumulated data • specify disjoint time intervals for which a unique labelcan be assigned • select a label (from the list of available labels) to associatewith a previously defined intervalBesides labeling, such a tool is also useful in order to verifythat an experiment is successfully completed and properlyrecorded, get insights about the HAS policy and investigateexperiments with unusual client behavior. B. Experimental setup
The testbed shown in Fig. 4 is designed in order to measureYouTube traffic in an automatic, controlled and reproduciblemanner. Two Android Smartphones (Huawei Nexus 6P, base-band version: angler-03.78, Android 7.1.1 with the securitypatch from December 5 th , 2016) are connected via a WirelessLocal Area Network (WLAN) to a Linux computer (Kernel3.16.0-71-lowlatency) that operates as a WLAN access point.The computer is connected to the Internet via a T1 line, thatacts as a gateway for the Smartphones, and controls the phonesvia a Universal Serial Bus (USB) connection. The WLANoperates in IEEE 802.11g mode at a carrier frequency of 2412MHz.Controlled configuration of network parameters such as rate,delay and Packet Error Rate (PER), extends the reproducibility martphone debugging and automatic control Linux machine1. Ethernet connection to Internet2. Wi-Fi access point for Smartphones 3. Traffic configuration via the tc tool
Fig. 4. Setup for measurements and ground truth and increases the functionality of our testbed. The WLANinterface, combined with the traffic configuration (tc) toolprovided in the Linux kernel [28], allow to configure networktraffic parameters and emulate sufficiently the networking dy-namics of Long-Term Evolution (LTE). Network layer packetlogs are recorded with tcpdump [29] on both the computerand the Smartphones. The traffic is generated with the nativeYouTube application (version: 12.32.60) for Android, whichaccording to our observations, performs standard DASH op-eration [3]. The YouTube application protects its streamingtraffic via TLS encryption and consequently, HTTP queriesare sent to the server TCP port 443. Over the course of ourmeasurements, the QUIC protocol was used in most cases.
C. HAS scenarios
In order to cover a variety of representative streaming sit-uations, we design and include in our experiments 8 differentscenarios, as specified in Table IV and listed below:(s1) Medium quality (480p), no adaptation and no trafficconfiguration for the entire video.(s2) High quality (720p), no adaptation, no traffic configura-tion for the entire video.(s3) Quality Change (720p to 480p) at a random time inthe interval [120 , s, no adaptation and no trafficconfiguration.(s4) Adaptive quality, rate limitation at 500kbit/s startingrandomly in the interval [120 , s with a duration of150s.(s5) Adaptive quality with constant rate limitation at1024kbit/s for the entire video.(s6) Adaptive quality based on DASH Industry Forum(DASH-IF) implementation guidelines [17, Table 5]. Be-sides the first step at 120s, all the next steps are appliedevery 40s. Rate, delay and PER traffic configurations areillustrated in Fig. 6.(s7) Adaptive quality, rate limitation at 120s by switchingfrom 3Mbit/s to 100kbit/s and back to 3Mbit/s every 40sand 45s respectively until the end of the video. TABLE IVHAS
SCENARIOS
Scenario Quality Rate limit (kbit/s) t −→ t ∈ [120 , s4 Auto Inf t −→ t −→ Inf t ∈ [120 , s, t = t + 150 −−→ t (cid:29) t t = 160 + 85 n, n ∈ N t = 205 + 85 n, n ∈ N t (cid:29) t t ∈ { , } s, t ∈ { , } s (s8) Adaptive quality, rate limitation of 100kbit/s starting at120s and 400s, with a duration of 60s in both cases.In order to improve the quality of our training dataset,during the experimental design we made sure that our sce-narios depict at least some clear state transitions. For thisreason, we decided to leave the first 120s of each scenariowithout rate limits, besides (s5), since we verified from ourexperimental results that for the selected video content, a firststate transition from filling to steady state appears during thistime in normal streaming conditions. It should be noted thatafter the video starts playing, we assume that the user does notinterrupt the video play-back by pausing or skipping forwardsor backwards. Even though these events can be detected, weneglect them for simplicity.Specifically, scenarios (s1) and (s2) are chosen in orderto study the performance of the algorithm in simple caseswith a single filling and steady state, for two quality levelswith significant bit-rate difference. Then, (s3) is selected toverify that the algorithm can also detect multiple transitionsbetween streaming states throughout the video session. (s4)is a more challenging scenario due to the introduced bufferdepleting state as a result of rate throttling. The randomnessin (s3)-(s4) is introduced in order to decrease the correlationwith the video encoding distribution. The main reason behindthe choice of (s5) is to include state changes under a ratethat is common for 3G networks. In (s6) we emulate morecomplex streaming conditions where the possibility of unclearbuffer states is higher. Fig. 6 shows the traffic configurationof this scenario, where a high-low-high rate profile is used.The indicated values of delay τ refer to transmission delay atthe access point and should not be confused with the RoundTrip Time (RTT), which in our experiments had a mean valueof 30ms. Finally, (s7)-(s8) are added to cover poor connectioncases where streaming is not supported even with the lowestavailable video quality. A representative example for eachscenario is shown in Fig. 5, where both buffer level and
200 400 600
Time (s) A cc u m u l a t e dd a t a ( M B ) Time (s)
Time (s)
Time (s)
Time (s)
Time (s) B u ff e r l e v e l ( s ) UnlabeledFillingSteadyDepletingUnclear0 200 400 600
Time (s)
Time (s) (s1) (s2) (s3) (s4) (s5) (s6) (s7) (s8)
Fig. 5. Examples of buffer level and streaming data per scenario for video [21], labeled according to Section V-A
Time (s) R a t e li m i t( M b i t / s ) τ = 11, ǫ = 1 τ = 13, ǫ = 1 . τ = 15, ǫ = 1 . τ = 20, ǫ = 1 . τ = 25, ǫ = 2 Fig. 6. Traffic configuration of rate (Mbit/s), delay τ (ms) and PER (cid:15) (%)for (s6) according to DASH-IF guidelines accumulated streaming data are labeled according to SectionV-A.For the selection of streaming content, our target is tocapture a variety of video bit-rate distributions amongst char-acteristic video clip types. To this end, we make a selection of3 movies, as we regard them to be representative of differenttypical video content types. As a first choice, we study Tearsof Steel ( TOS ) [30], a high motion semi-animated open actionmovie which is commonly used for testing video codecs andstreaming protocols and that is also recommended in themeasurement guidelines of DASH-IF [17]. TOS is of 12:14min duration and represents a high motion video clip. We alsoselect a nature documentary (
Nature ) [31] of 9:21 min durationthat contains complex scenes with gradual changes. As a thirdchoice, a talk-show (
TalkShow ) [21] of 9:19 min duration isselected as streaming content of a low-motion video clip. Allclips are encoded with the H.264 codec in an MP4 container.Our selection of the clips intentionally excludes monetizedcontent, as we want to avoid advertisements at the beginningor during the video session interfering with our measurements.This is done without loss of generality, as advertisements canbe identified as different packet flows and separated from therest of the streaming data. Table V presents the YouTube video
TABLE VS
TREAMING CONTENT
Video id Quality Duration fps
TOS OHOpb2fS-cM 144p-1080p 12:14 min 24Nature 2d1VrCvdzbY 144p-1080p 09:21 min 30TalkShow N2sCbtodGMI 144p-1080p 09:19 min 25 id of each tested movie along with main characteristics, i.e.available representation range, duration and frame rate (fps).
D. Data sets
ABLE VIN
UMBER OF SAMPLES PER CLASS IN THE TRAINING DATASET
Class filling steady depleting unclear
Total
HAS 23296 82225 9722 24297 139540(16.7%) (58.9%) (7.0%) (17.4%) (100.0%)non-HAS – – – – 8071 framework that navigates to a specified Web page from a list ofpages via the built-in Web browser of the Smartphone and thenproceeds to the next page in the list after a random amount oftime of [5 , s, emulating users browsing on the Web. In orderto generate traffic with no trivial data size, we selected a listof pages that host a significant number of photos besides plaintext. We explicitly exclude Web pages that host video elementsand leave the problem of classifying different types of videofor our future work. Similarly to the video experiments, beforeevery experiment the cache memory is cleared in order to avoidany caching issues for subsequent measurements.Table VI summarizes the number of samples per class in ourtraining dataset, collected for a sampling period of T s = 1 sand manually labeled according to Section V-A. Since ourtarget is to classify transmitted data per sampling period, thevalues in Table VI count only non-empty samples, i.e. secondswith one or more packets. This filtering allows us to reduce thedataset size by keeping only meaningful entries. As previouslyexplained, this dataset is used for both training and testing byapplying k -fold cross-validation.VI. R ESULTS
The measurement results for the proposed classification offlow type and buffer state are presented in this section. Themain free parameters related to both feature construction andconfiguration of the ML classifiers are shown in Table VII.The kernel type for SVM and the number of nearest neighborsfor KNN are selected by keeping the best option in termsof overall accuracy after studying a set of commonly usedvalues with our dataset. For the 3 methods based on trees, asmall number of trees is used initially, as we want to keep lowcomplexity and memory requirements, but we detail more onthe impact of this parameter at the end of this section.
A. Video flow classification
Fig. 7 presents the overall accuracy of each ML algorithmfor the problem of flow classification. Overall accuracy isdefined here as the ratio of correctly classified samples overthe total number of samples. The performance evaluation isbased on k -fold cross-validation, by studying 3 different valuesof k . From this figure we can easily observe that all MLclassifiers perform with very high accuracy. Besides SVM thathas an accuracy slightly more than 99.5%, all the rest of thealgorithms perform similarly, while RF is the best with anaccuracy close to 99.98% for k = 10 . We also notice thatfactor k does not have a significant impact on the results,which is expected as the impact of k diminishes with largedata sets, verifying that the size of our dataset is sufficient to TABLE VIIM
AIN PARAMETERS FOR FEATURES AND ML CLASSIFIERS
Description Value Unit sampling period ( T s ) ( T w ) { , , , } sthreshold for DLload ( h t ) ULnPckts ( h s )
100 Bytes k -fold cross validation ( k ) [all ML alg.] { , , } SVM KNN AdaBoost RF RUSBoost
ML algorithm O v e r a ll a cc u r a c y ( % ) k = 2 k = 5 k = 10 Fig. 7. Overall accuracy for flow classification by k -fold cross-validation get statistically good estimates of classification performance.Thus, for the rest of the analysis we present results only for10-fold cross-validation, which is the most commonly usedsetting in the literature.A confusion matrix that summarizes the results of eachalgorithm is presented in Table VIII, where we highlight theperformance of RF as the best algorithm according to ourstudies. From this table we see that HAS traffic is almostalways classified correctly, while the overall accuracy is mostlyaffected by false positives , i.e. samples of non-HAS trafficincorrectly classified as HAS traffic. However, the percentageof false positives is very low with a value less than 0.4% for allalgorithms apart from a 5.7% for SVM. As a general remark,one should keep in mind that all classification results are persampling period. In practice, a packet flow cannot changeclass until its termination, since it either belongs to HAS ornon-HAS traffic. This fact enables post-processing methodsto perform a second step of flow classification based on theexistence of a dominant class, where isolated samples withpredicted class different from the dominant predicted class canbe neglected.Fig. 8 shows the importance of each feature for the RFalgorithm. This metric naturally ranks features according to ABLE VIIIC
ONFUSION MATRIX FOR FLOW CLASSIFICATION ( k = 10 ) True Predicted (SVM/KNN) (%) HAS non-HASHAS 99.8/100.0 0.2/0.0non-HAS 5.7/0.3 94.3/99.7
True Predicted (AdaBoost/RF/RUSBoost) (%) HAS non-HASHAS 100.0/ /100.0 0.0/0.0/0.0non-HAS 0.3/0.2/0.4 99.7/ /99.6 their relevance for the classification [23]. The score is nor-malized over the maximum obtained value and defined for afeature m as the mean difference in out-of-bag error betweenthe original forest and a modified version where the values offeature m are randomly permuted. Out-of-bag error is definedas the mean classification error over each training sample x i , using the votes only from trees that do not contain x i .Going back to Fig. 8, the complete set of M = 20 featuresis included by applying L = 4 sliding windows in parallel.From this figure we observe that DLrate for all windows,along with
DLload for T w = { , , } s and ULavgSize for T w = { , } s, are ranked as the most important features.First, the combination of DLrate and
DLload for different T w allows to capture the unique HAS on-off pattern at steady state,but also to detect controlled rate changes, due to transitionsbetween filling and steady states. This pattern is present neitherin file downloads nor in Web browsing. Moreover, statisticsover the uplink packet size are also useful, as the importancevalue for ULavgSize suggests. As expected, HAS requestshave similar size during the entire video session that doesnot vary significantly over our video set, which proved to bedifferent from what happens in our non-HAS dataset. It isworth noting that for
ULavgSize , a large window is requiredin order to cover the inter-request times in the steady state.Finally and perhaps counter-intuitively,
ULstdSize shows tohave a negligible impact on the RF performance. Nevertheless,we keep it in our feature set as we believe that it can boostthe classification performance if other types of non-HAS trafficare studied and added in the training dataset.Table IX presents the runtime of the 5 studied ML algo-rithms, measured on an Intel Xeon CPU E5-4627 v2, running32 cores at 3.30GHz with 512 GB RAM. The training timeis measured for the entire dataset, while the prediction time ismeasured per 1000 samples. The computational time statisticsare calculated over 100 repetitions, both for training and pre-diction. These results clearly show that SVM has a moderateprediction time and is by far the most demanding in terms oftraining time. On the contrary, KNN has a trivial training phaseas it simply stores samples. It is, however, quite expensivein terms of prediction time since it requires to compute thedistance between a new sample and all training samples. RFis slightly slower than AdaBoost and RUSBoost for the samenumber of trees since it builds deeper trees. Nevertheless, these
DLrate DLload ULnPcks ULavgSize ULstdSize
Set of features N o r m a li ze d f e a t u r e i m p o r t a n ce T w = 1s T w = 5s T w = 10s T w = 20s Fig. 8. Feature importance for flow classification using RF and 10-fold cross-validation TABLE IXR
UNTIME FOR VIDEO FLOW CLASSIFICATION ( MEAN µ AND STANDARDDEVIATION σ ) ML alg. Training Prediction µ (s) σ (s) µ (ms) σ (ms)SVM 463.23 8.08 59.72 0.24KNN 0.64 0.01 190.80 3.27AdaBoost 8.63 0.07 11.20 0.33RF 10.40 0.34 11.76 0.23RUSBoost 3.56 0.10 10.28 0.32 B. Buffer state classification
Having confirmed the high accuracy of HAS flow classifica-tion, we proceed with studying the performance of buffer stateclassification. As before, we start by presenting the overallaccuracy of each ML algorithm in Fig. 9, for different k -foldcross-validation modes. From this figure we verify once againthat factor k does not affect the results and thus, we keepthe value k = 10 for the rest of the presented results. RFstill has the best performance with an accuracy of 99.3% for k = 10 , while KNN follows closely with 99% and the rest ofthe algorithms fall behind with a value ranging from 93.9% to95.8%. A relationship between RF and KNN that may explaintheir similar behavior is discussed in [33], where it is shownthat both can be viewed as weighted neighborhoods schemes.The respective confusion matrix is presented in Table X,where again we highlight the performance of RF as the bestalgorithm. From this table we can see that SVM and AdaBoostmainly suffer from falsely classifying a depleting state as a ABLE XC
ONFUSION MATRIX FOR BUFFER STATE CLASSIFICATION ( k = 10 ) True Predicted (SVM/KNN) Predicted (AdaBoost/RF/RUSBoost) (%) filling steady depleting unclear filling steady depleting unclearfilling 90.4/98.3 3.5/0.6 0.9/0.7 5.3/0.3 93.3/ /94.2 4.9/0.5/2.4 0.5/0.6/1.7 1.3/0.2/1.7steady 0.7/0.2 98.1/99.5 0.4/0.1 0.8/0.2 0.3/0.1/0.9 98.8/ /94.5 0.2/0.1/2.5 0.8/0.1/2.1depleting 1.2/2.1 13.6/1.4 83.0/96.4 2.2/0.2 2.0/1.4/1.7 11.0/0.9/2.8 86.5/ /94.5 0.6/0.1/1.0unclear 4.2/0.3 5.6/0.6 0.4/0.0 89.8/99.0 3.3/0.2/4.8 4.5/0.5/3.3 0.3/0.0/0.7 91.8/ /91.1
SVM KNN AdaBoost RF RUSBoost
ML algorithm O v e r a ll a cc u r a c y ( % ) k = 2 k = 5 k = 10 Fig. 9. Overall accuracy for buffer state classification by k -fold cross-validation Scenario index O v e r a ll a cc u r a c y ( % ) SVMKNNAdaBoostRFRUSBoost
Fig. 10. Overall accuracy per scenario for buffer state classification by 10-foldcross-validation steady state. It is interesting to observe that this is not truefor RUSBoost which does not fall below 91% in terms ofcorrect prediction per class, compared to 83.0% and 86.5% forSVM and AdaBoost respectively. The reason is that RUSBoostgives more weight to classes with few samples, which hurtsthe correct classification of the dominant steady state class
DLrate DLload ULnPcks ULavgSize ULstdSize
Set of features N o r m a li ze d f e a t u r e i m p o r t a n ce T w = 1s T w = 5s T w = 10s T w = 20s Fig. 11. Feature importance for buffer state classification using RF and 10-fold cross-validation (see Table VI) and therefore the overall accuracy, as weverify in Fig. 9. As explained before, post-processing methodscorrecting isolated samples that are clearly misclassified canimprove the accuracy results for this problem as well.Fig. 10 shows the overall accuracy of each ML algorithmper experimental scenario. This allows to verify how challeng-ing is buffer state classification for each designed scenario.All algorithms perform better for (s1)-(s2), i.e. the simplestscenarios in our set involving a single filling and steadystate. Their performance is slightly reduced for (s3), mainlydue to the transition period that follows the manual qualitychange. (s4)-(s8) are clearly more challenging, since morebuffer state changes are introduced. Nevertheless, RF andKNN manage to maintain an accuracy higher than 97.7% and96.8%, respectively, for all scenarios.Fig. 11 presents the normalized importance of each featureusing the RF algorithm. As before, we apply L = 4 slidingwindows in parallel in order to capture both short-term andlong-term variations. Contrary to Fig. 8, here we can see that DLload is the most important feature, with significant contri-butions from
DLrate and
ULnPckts . This comes as no surprise,as these are the 3 features that we specifically selected forHAS traffic. Combined information about downlink streamingrate, percentage of time used for streaming data and frequencyof uplink packets enables RF to clearly distinguish different
10 20 30 40 50 60 70 80
Number of trees C l a ss i fi c a t i o n e rr o r ( % ) k -fold ( k = 10)out-of-bag Fig. 12. Classification error for buffer state classification as a function of thenumber of trees using RF TABLE XIR
UNTIME FOR BUFFER STATE CLASSIFICATION ( MEAN µ AND STANDARDDEVIATION σ ) ML alg. Training Prediction µ (s) σ (s) µ (ms) σ (ms)SVM 950.99 2.37 63.73 0.31KNN 0.96 0.04 358.25 5.03AdaBoost 9.94 0.10 13.25 0.68RF 21.02 0.22 19.73 1.50RUSBoost 6.11 0.24 13.38 1.33 buffer states and achieve an overall accuracy of 99.3%. Fig. 11also justifies our selection of multiple sliding windows, withimportant features for all T w values.Fig. 12 presents the classification error of RF as a functionof the number of trees, which is a key parameter for itsimplementation. Both out-of-bag and k -fold cross-validationerror are shown to have similar behavior, quickly decreasingas the number of trees increases. This trend holds up to avalue around 50 trees where both errors converge and stabilizeclose to a value of 0.7% and thus, training more trees does notincrease performance. We can also verify that even our initialchoice of 30 trees has only a slight impact on the out-of-bagerror with an increase of 0.03%.Finally, Table XI presents the runtime performance of theML algorithms. These measurements were obtained in thesame manner as the runtime for video flow classification (TableIX) and are similar to these results. We notice that SVM is themost demanding in training time, KNN suffers in predictiontime, while RF, AdaBoost and RUSBoost still perform well inboth prediction and training phases. We see that the runtime isgenerally higher than in Table IX, and even doubles for somecases. This verifies that our buffer state classification problemis computationally more demanding than the presented flowclassification, due to the higher number of classes. VII. C ONCLUSION
We introduced a new traffic profiling system to classifyflows and buffer states of HAS traffic in real time, basedon machine learning. The core of our approach is a classifierthat separates HAS from non-HAS traffic and detects 4 bufferstates of the streaming client.Studying 5 classification methods in our dataset forYouTube’s mobile streaming client shows that separating HASfrom Web traffic and file downloads is not a challengingproblem. Even with highly varying link quality and videobit-rate adaptation, all ML models closely approached 100%accuracy. For buffer state classification, however, SVM andboosting methods failed, making RF and KNN a very accuratechoice in general, with RF being a most attractive approachboth in terms of accuracy and runtime.Our probably most surprising finding is that such highaccuracy can be reached with a small, generic feature setthat is only observed at the IP layer. Since no transportlayer information is used, our approach works equally forTCP and UDP-based HAS traffic (e.g., with QUIC). Sinceno application layer information is used, our system does notinterfere with end-to-end encryption and requires neither DPInor cross-layer signaling. Since the feature set is small, on-lineand off-line complexity are consistently low. Traffic profiling,thus, provides MNOs with a low-complexity alternative toDPI-based packet dissection and OTT flags.As buffer states, and their manifestation in packet IATs,are a fundamental property of media streaming, we believethat our feature set will maintain high accuracy for variousmajor streaming services, independently of the content. Ourfuture work will, thus, prioritize the extension of our highquality dataset to more videos and to more services thanYouTube. We plan to publish our data set in the near future,so that other researchers can reproduce our results and use ourmeasurements in their own studies.A
CKNOWLEDGMENTS
We thank Florian Wamser, Bernd Zeidler, Michael Seufert,and Phuoc Tran-Gia from the University of W¨urzburg, Ger-many for their valuable contribution on the development ofthe YouTube wrapper application and for their comments thatgreatly improved the design of our experiments. We also thankour colleagues Yuejun Wei and Yuchen Chen for their insighton the product integration of our work.R
EFERENCES[1] Cisco, “Visual networking index: Global mobile data traffic forecastupdate, 2016–2021,”
White Paper , Feb. 2017.[2] Sandvine, “Global Internet phenomena: Latin America & North Amer-ica,”
Report , Jun. 2016.[3] ISO/IEC, “Dynamic adaptive streaming over HTTP (DASH),”
Interna-tional Standard 23009-1:2014 , May 2014.[4] R. Pantos and W. May, “HTTP live streaming,”
IETF, InformationalInternet-Draft 2582
Proc. IEEE ICC , May 2016, pp. 1–6.[7] D. D. Vleeschauwer et al. , “Optimization of HTTP adaptive streamingover mobile cellular networks,” in
Proc. IEEE INFOCOM , Apr. 2013,pp. 898–997.[8] A. Molavi Kakhki et al. , “BingeOn under the microscope: UnderstandingT-Mobile’s zero-rating implementation,” in
Proc. ACM Internet-QoE ,Aug. 2016, pp. 43–48.[9] T. T. T. Nguyen and G. Armitage, “A survey of techniques for Internettraffic classification using machine learning,”
IEEE CommunicationsSurveys Tutorials , vol. 10, no. 4, pp. 56–76, Fourth 2008.[10] G. Dimopoulos, P. Barlet-Ros, and J. Sanju`as-Cuxart, “Analysis ofYouTube user experience from passive measurements,” in
Proc. IEEECNSM , Oct. 2013, pp. 260–267.[11] I. Orsolic, D. Pevec, M. Suznjevic, and L. Skorin-Kapov, “A machinelearning approach to classifying YouTube QoE based on encryptednetwork traffic,”
Springer Multimedia Tools and Applications , vol. 76,no. 21, pp. 22 267–22 301, Nov. 2017.[12] G. Dimopoulos, I. Leontiadis, P. Barlet-Ros, and K. Papagiannaki,“Measuring video QoE from encrypted traffic,” in
Proc. ACM IMC ,Nov. 2016, pp. 513–526.[13] M. Katsarakis, R. C. Teixeira, M. Papadopouli, and V. Christophides,“Towards a causal analysis of video QoE from network and applicationQoS,” in
Proc. ACM Internet-QoE , Aug. 2016, pp. 31–36.[14] S. Galetto et al. , “Detection of video/audio streaming packet flows fornon-intrusive QoS/QoE monitoring,” in
Proc. IEEE MN , Sep. 2017, pp.1–6.[15] V. Krishnamoorthi, N. Carlsson, E. Halepovic, and E. Petajan,“BUFFEST: Predicting buffer conditions and real-time requirements ofHTTP(S) adaptive streaming clients,” in
Proc. ACM MMSys , Jan. 2017,pp. 76–87.[16] D. Tsilimantos, T. Karagkioules, A. Nogales-G´omez, and S. Valentin,“Traffic profiling for mobile video streaming,” in
Proc. ICC , May 2017,pp. 1–7.[17] DASH Industry Forum, “Guidelines for implementation: DASH-AVC/264 test cases and vectors,”
Report , Jan. 2014.[18] J. Roskind, “QUIC: Multiplexed stream transport over UDP,”
DesignDocument and Specification Rationale , Dec. 2013.[19] T. Karagkioules, C. Concolato, D. Tsilimantos, and S. Valentin, “Acomparative case study of HTTP adaptive streaming algorithms inmobile networks,” in
Proc. ACM NOSSDAV , Jun. 2017, pp. 1–6.[20] J. Samain et al. , “Dynamic adaptive video streaming: Towards a system-atic comparison of ICN and TCP/IP,”
IEEE Transactions on Multimedia
Journal of Computerand System Sciences , vol. 55, no. 1, pp. 119–139, Aug. 1997.[23] L. Breiman, “Random forests,”
Machine Learning , vol. 45, no. 1, pp.5–32, Oct. 2001.[24] C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano,“RUSBoost: A Hybrid Approach to Alleviating Class Imbalance,”
IEEETransactions on Systems, Man, and Cybernetics - Part A: Systems andHumans , vol. 40, no. 1, pp. 185–197, Jan. 2010.[25] The MathWorks Inc. (2015, Sep.) Statistics and Machine LearningToolbox v10.1. [Online]. Available: https://mathworks.com/products/statistics.html[26] C. Fraleigh et al. , “Packet-level traffic measurements from the sprint ipbackbone,”
IEEE Network , vol. 17, no. 6, pp. 6–16, Nov. 2003.[27] A. Mondal et al. , “Candid with YouTube: Adaptive streaming behaviorand implications on data consumption,” in
Proc. ACM NOSSDAV