A Study on the Characteristics of Douyin Short Videos and Implications for Edge Caching
Zhuang Chen, Qian He, Zhifei Mao, Hwei-Ming Chung, Sabita Maharjan
AA Study on the Characteristics of Douyin ShortVideos and Implications for Edge Caching
Zhuang Chen
Guilin University of ElectronicTechnologyGuilin, [email protected]
Qian He
Guilin University of ElectronicTechnologyGuilin, [email protected]
Zhifei Mao
Norwegian University of Scienceand Technology (NTNU)Trondheim, [email protected]
Hwei-Ming Chung
University of OsloOslo, [email protected]
Sabita Maharjan
University of Oslo, and SimulaResearch LaboratoryOslo, [email protected]
ABSTRACT
Douyin, internationally known as TikTok, has become oneof the most successful short-video platforms. To maintainits popularity, Douyin has to provide better Quality of Ex-perience (QoE) to its growing user base. Understanding thecharacteristics of Douyin videos is thus critical to its serviceimprovement and system design. In this paper, we presentan initial study on the fundamental characteristics of Douyinvideos based on a dataset of over 260 thousand short videoscollected across three months. The characteristics of Douyinvideos are found to be significantly different from traditionalonline videos, ranging from video bitrate, size, to popular-ity. In particular, the distributions of the bitrate and size ofvideos follow Weibull distribution. We further observe thatthe most popular Douyin videos follow Zifp’s law on videopopularity, but rest of the videos do not. We also investigatethe correlation between popularity metrics used for Douyinvideos. It is found that the correlation between the numberof views and the number of likes are strong, while other cor-relations are relatively low. Finally, by using a case study, wedemonstrate that the above findings can provide importantguidance on designing an efficient edge caching system.
KEYWORDS
Edge Caching, QoE, Video Popularity, Zipfian Distribution,Weibull Distribution, Douyin (TikTok)
The success of online video sharing platforms (e.g., YouTube,Facebook, Instagram, and Snapchat) has been phenomenal.According to Cisco’s annual Visual Network Index (VNI)forecast [5], video accounts for an overwhelming share oftotal Internet traffic. With the increasing usage of mobiledevices, video traffic moves from wired ends (e.g., PCs) to mobile ends (e.g., smartphones). With this shift to the mobileInternet, the video sharing industry has been reshaped inrecent years. One of the biggest trends is the emergence ofshort-form video platforms (e.g., Douyin). These platformshave typically a large number of User Generated Content(UGC) with few tens of seconds that is two orders of magni-tude shorter than the length of a typical traditional video.The success of similar mobile Internet applications likeDouyin, Instagram, and YouTube, depends on rich videolibraries, and even more importantly, for optimally design-ing the caching system. Moreover, short video services withever-increasing popularity use a large portion of Internetbandwidth. Besides, they are time-sensitive, especially forshort video platforms like Douyin. Douyin has now exceed150 million active daily users in China, while the average sizeof a video file uploaded is 1.96 MB [3]. If every user only up-loads a 1.96 MB video every day, the total disk space requiredto store all the videos is at least about 294 TB. Therefore,dynamic and efficient caching is necessary for such plat-forms. In addition, bandwidth cost and end-to-end latencyare equally important issues for Douyin. QoE is no doubtthe biggest challenge it faces. Edge caching [1] can not onlyreduce the usage of backhaul bandwidth and the delay, butalso improve energy efficiency, that is vitally important forcapacity planning and for QoE enhancement.Established in 2016, Douyin, also known as TikTok, hasbecome one of the fastest-growing mobile Internet applica-tions. Industry insiders estimated that Douyin outstrippedYouTube, Facebook, Instagram, and Snapchat in total down-loads in September 2018 [10], and Sensor Tower estimatesthat Douyin has surpassed one billion installs on the AppStore and Google Play in February 2019 [11]. On the otherhand, some studies such as YouTube [8] and Twitter [12],have analyzed different characteristics, for studying caching a r X i v : . [ c s . MM ] M a r echanism. Different from YouTube and Twitter, Douyinis specially designed to provide short videos for the mobileInternet users. While the caching mechanism for Douyin isbased on the studies of traditional short videos, three dis-tinct features of Douyin call for novel design of the cachefor Douyin. First, the number of Douyin videos is muchlarger compared to the number of traditional short videos.Second, the size of Douyin videos is much smaller than con-ventional short videos (90% being less than 1.5 MB, while atypical YouTube short video is 25 MB [4]). Finally, the view-ing frequency of the most popular Douyin videos fits thewell-known Zipf distribution, which can have an importanteffect on edge caching for short videos [2]. Considering thegrowing popularity and use of the platform, understandingthe characteristics of Douyin videos is therefore of importantfor designing a dynamic and efficient caching system for theplatform.In this paper, we present an initial study on the fundamen-tal characteristics of Douyin videos. We analyze the featuresof video file and popularity on a dataset of over 26000 shortvideos collected through a three-month period in early 2018.We show that Douyin video bitrate and size can be modeledas Weibull distribution[9]. We also look closely at the popu-larity metrics of Douyin videos, in terms of number of views,number of likes, number of comments and number of shares.The correlation analysis using Pearson coefficient indicatesthat most of the popularity metrics have minimal correla-tion, except the number of views and the number of likes.Nevertheless, by using a case study, it is suggested that ouranalysis in this paper can serve as guidelines for designingefficient caching system tailored specifically for the uniquecharacteristics of Douyin short videos.To the best of our knowledge, our work is the first tostudy Douyin short videos, which not only provides a basisfor further exploring and understanding Douyin, but alsoprovides an initial foundation for the design of edge cachingsystems for the the latest short video platforms.Our main contributions in this paper can be summarizedas: • We provide the first and extensive characterization ofDouyin short videos, based on a real-world datasetfrom Douyin. • We examine the popularity distribution of Douyinvideos, which will be of special importance to designa popularity-based caching system for Douyin. • We further analyze the relationship between popular-ity metrics used for Douyin videos. The analysis resultsuggests that a new computing paradigm for videopopularity is indeed needed. In addition, a case studyfor edge caching is designed based on the above re-sults, which further confirms the applicability of our research. Moreover, the case study suggests that theresults of our study can be highly beneficial to thedesign and development of an efficient and intelligentcaching system for the latest form of mobile socialshort video media like Douyin.The remainder of the paper is organized as follows. InSection 2, a brief background on Douyin, and its video datasetis presented. We study the characteristics of Douyin shortvideos in Section 3. We analyze the popularity of Douyinvideos in Section 4. We conduct a case study on edge cachingin section 5. Finally, we conclude the paper with an outlooktowards future work in Section 6.
In this section, first we provide a short introduction of Douyin.Then the dataset of Douyin short video is described.
Douyin is a mobile short video platform with powerful edit-ing capabilities, which enables users to add various typesof music and effects on their videos. The length of Douyinvideos is restricted to 15 seconds, which makes them moreattractive.The content delivery mechanism of Douyin is decentral-ized. When receiving the user-uploaded videos, Douyin ranksthem and recommends the relevant short videos by analyz-ing the interests of the user. Douyin contains a large amountof UGC. For a large amount of short video content, the rec-ommendation mechanism calculates the tag for each video,which is designed to classify videos according to categorycharacteristics. Then, it maps the tag of the video to the userswho have the same tag.
Our dataset consists of the metadata of short videos. In addi-tion to the history of the short videos uploaded, Douyin alsoarchives users’ profiles and their social networks includingfollowed users and their fans.Douyin assigns a distinct 19-digit decimal ID for eachvideo. Each video contains the following meta-data: videoID, the time when it was released, bitrate which is the playbitrate of each video, video length, which is the play durationof each video, video file size that is one of the key metricsfor caching, verification type that indicates whether the userwho uploaded the video, has passed the official certificationof Douyin, number of views and number of likes, number ofcomments and number of shares. The basic parameters ofthe meta-data is shown in Table 1.The data was collected from 1st February, 2018 to 10thMay, 2018, including 270 thousand videos from different sers. After removing repeated videos, we got 260939 videos.Each entry contains all the meta-data except the video size. Table 1: Meta-Data of A Douyin Video
Video ID 6553843141084974340Video Release Time May 10, 2018, 14:58:00Bitrate 1104867 bpsVideo Length(Duration) 15070 msVideo File Size 1 .
98 MBVerification Type 1Number of Views 1564Number of Likes 12Number of Comments 3Number of Shares 1
In this section, we characterize the Douyin video files, interms of video length, video bitrate and video size. The char-acteristics of Douyin videos can be classified into two types:time-invariant and time-variant. Some characteristics arestatic, such as video length, video file size, and video pub-lished time, while others are dynamic, e.g., number of views,number of likes, number of comments and number of shares.However, the information is static within each time slot. Thefollowing characteristics are studied: video length, video filesize, number of views, number of likes, number of comments,and number of shares. In addition, we also investigate therelationships among them.
The length of Douyin video is one of the most significantdifferences compared to traditional videos, which narmallylast 0.5-2.5 hours (e.g., YouTube [4]), Douyin mainly providesshort musical videos, 95% of the video length in our entiredatasets are within 15 seconds imposed by Douyin on regularuser uploads. However, we do find videos that exceeded thislimitation, because Douyin officially permits a small groupof authorized users to upload videos longer than 15 seconds.Fig. 1 shows the probability density function (PDF) andcumulative distribution function (CDF) of the Douyin videos’length within 70 seconds, which exhibits two peaks. Thehighest peak is between 14 and 16 seconds, which accountsfor about 65% of the videos. In addition, it is clear that thevideos of 15 seconds is the most popular among users. Thesecond peak is between 9 and 11 seconds, accounting forabout 27% of the total.
10 20 30 40 50 60 70Video duration (seconds)0.00.10.20.30.4 P D F C D F PDF of the original dataCDF of the original data
Figure 1: Distribution of Douyin Video Lengths. P D F C D F Weibull ( k =1.86, =1452.14)PDF of the original dataCDF of the original data Figure 2: Distribution of Douyin Video Bitrates. P D F C D F Weibull ( k =2.98, =1.14)PDF of the original dataCDF of the original data Figure 3: Distribution of Douyin Video File Size. able 2: Statistics of Video Length, Bitrateand Size Min Max Mean Median Std.Dev.length(s) 4 73 13.1 14 3.9bitrate(kbps) 0 4719 1271.6 1205 691.3size(MB) 0 24.5 1.96 1.8 1.2
Note: Std.Dev. is short for Standard Deviation.
The bitrate of a video is an indicator of its playback quality.Low bitrate degrades user’s QoE [6], leading to decline inthe popularity of Douyin over time. We observe that Weibulldistribution fits the skewed curve of the bitrate of Douyinvideos. This insight can be useful in designing an efficientcaching system based on adaptive bitrate [7].Fig. 2 shows there are two peaks of bitrate among thevideos viewed. One peak is around 1130 kbps, and the otherone is around 410 kbps. Only about 2.7% of the videos areencoded at lower bitrates below 200 kbps. Similarly, approx-imately 1.8% of the videos are encoded at a higher bitratethan 3000 kbps. This implies that it does not follow the well-known Zipf distribution. Compared to conventional videos,Douyin videos have higher bitrates i.e., 74.1% of the videoshave bit rates between 500 Kbps and 3000 Kbps, which ispossibly due to the development of network communicationtechnology and the enhancement of the function of the de-vice chip. In the near future, the widespread commercializa-tion of 5G communication technology will further increasethe range of bitrate for multimedia video.
The video file size information is not available for Douyinvideos. However, we can calculate the video file size fromvideo length (duration) and its bitrate. The size of each videofile can be calculated as follows
Size = bitrate × lenдth (1)As illustrated in Fig. 3, we plot the PDF and CDF of videofile sizes, and find that the distribution of video file sizes isdifferent from the distribution of video lengths, even if thereis a direct relationship between them. In the collected dataset,97.8% of the videos are smaller than 5 MB. Based on Table 2,an average video file size is 1.96 MB, which is smaller thanthat of the YouTube videos (7.6 MB) [4]. However, consider-ing that there are 150 million active users in a day, if everyuser uploads a 1.96 MB video, the total disk space requiredto store all the videos is at least 294 TB everyday. Therefore,efficient caching is essential. We also list the statistics ofvideo length, bitrate, and size in Table 2. (a) PMF of video views. (b) PMF of video likes.(c) PMF of video comments. (d) PMF of video shares. Figure 4: Distributions of popularity metrics. (a) CDF of video views. (b) CDF of video likes.(c) CDF of video comments. (d) CDF of video shares.
Figure 5: CDF of popularity metrics.
Video popularity plays an important role in the design of rec-ommendation system and cache mechanism. Popular videosare likely to be recommended to the users and cached atthe edge servers close to the users in order to reduce delay.In this section, we analyze the popularity of Douyin shortvideos. .1 Distribution There are four popularity indicators available for each Douyinvideo: number of views, number of likes, number of com-ments, and number of shares. We rank all the videos in termsof the above popularity indicators, normalize the popularityvalues and plot on a log scale in Fig. 4, it can be easily seenfrom Fig. 4 that the distributions do not follow Zipf’s lawwhich appears approximately linear on log-log plot. How-ever, we conformed that the distribution of 5000 most popularvideos follow the Zipf’s law (In the next section, we will ex-plain the rationale to approximate the distributions of onlythe most popular videos and how such approximation can beleveraged in designing the caching system.). Mathematically,Zipf’s law can be defined by p n ∼ n − α (2)meaning that the popularity of the n th most popular video p n is n α the popularity of the most popular video, where α is a constant parameter. As seen in Fig. 5 (a), by doingleast squares polynomial fitting, we found that Zipf’s lawwith α = .
552 fits very well to our empirical observations.The normalized number of views of the three most popularvideos are p = . p = . p = . p ∼ . ∗ p and p ∼ . ∗ p .From above analysis, one may conclude that the most pop-ular videos take away most of the views as well as likes,comments and shares. This is also verified by Fig. 5. For ex-ample, 18 .
6% most popular videos unproportionally accountfor 80 .
5% of all the views. It means that most of the videosget very few views compared to the popular ones.
Is a video that is popular in terms of number of views alsopopular in terms of number of likes, number of comments,or number of shares? To answer this question, we use thePearson correlation coefficient to study the correlation ofthe four popularity indicators. The Pearson correlation coef-ficient between two variables X and Y is given by ρ ( X , Y ) = E [ XY ] − E [ X ] E [ Y ] (cid:112) E [ X ] − [ E [ X ]] (cid:112) E [ Y ] − [ E [ Y ]] (3)The Pearson correlation coefficient is unitless and rangesfrom -1 to 1. Higher coefficient indicates higher correlation.Fig. 6 shows the correlation coefficients between each twoof the four popularity indicators. We can see that all cor-relation coefficients are positive. In particular, the numberof views and the number of likes have a very high correla-tion coefficient which is 0.91, meaning that a video whichis popular in terms of number of views is very likely to bepopular in terms of number of likes and vice versa. In com-parison, other coefficients are relatively low. Especially, the correlation between the number of shares and the numberof comments is very low. Figure 6: Correlation coefficient of popularitymetrics.
The above study on Douyin has important implications onimproving Douyin’s services. As the user base of Douyin isexpanding rapidly, the content servers are facing challengeto answer users’ requests timely. In the 5G era, it is possibleto cache contents at the edge servers which make servicescloser to the end users. In this section, we show the feasibilityand benefits of caching popular short videos at the edge.
Figure 7: Hit-ratio as a function of the number ofcached videos.
Assume that the probability of a video with rank κ iswatched in the next time slot τ is p τκ = V τκ V τ , where V τκ is he number of views on a video with rank κ and V τ is thetotal number of views of all videos. Fig. 5 (a) has alreadyshown that the number of views of the most popular videosfollows Zipf’s law with the characteristic exponent α = . V τκ = V τ ( / κ . (cid:205) Nτi = / i . ) where N τ is the number of videosand p τκ = / κ . (cid:205) Nτi = / i . is the Zipf’s law function. Given 1000Douyin videos, the hit-ratio of caching the most popularvideo is as high as 2.1% ( p τ = / . (cid:205) i = / i . = . We studied latest short videos from Douyin in terms threebasic fundamental features and four popularity metrics. First,we studied how Douyin works and found out that it is a de-centralized video social media based on massive short videosand a strong recommendation mechanism. Second, we stud-ied the characteristics of Douyin videos and found out thatthe distributions of the bitrate and the size of videos closelyfollow Weibull distribution. Third, we analyzed distributionsof different popularity metrics and discovered that the popu-larity metrics of the most popular Douyin videos obey Zipfdistribution, but rest of the videos do not, which is the casewith traditional videos. We also analyzed the relationshipsof key popularity metrics and figured out that four of themare not highly correlated, except the relationship betweenthe number of views and the number of likes.Based on this work, we believe that our work provides aninitial foundation for the design of future advanced cachingsystems for short video platforms. There are several possibledirections for future work, that can be followed based onour work. For instance, using deep reinforcement learning, we would like to design a proactive and effective cachingsystem that can reduce the short video edge server load.
REFERENCES [1] N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie. 2018. Mobile EdgeComputing: A Survey.
IEEE Internet of Things Journal
5, 1 (Feb 2018),450–465.[2] L. Breslau, Pei Cao, Li Fan, G. Phillips, and S. Shenker. 1999. Webcaching and Zipf-like distributions: evidence and implications. In
IEEEINFOCOM ’99. Conference on Computer Communications. Proceedings.Eighteenth Annual Joint Conference of the IEEE Computer and Com-munications Societies. The Future is Now (Cat. No.99CH36320) , Vol. 1.126–134 vol.1.[3] BusinessofApps. 2019. TikTok Revenue and Usage Statistics (2019).(2019). https://influencermarketinghub.com/tiktok-statistics/ [Online;accessed 27-February-2019].[4] X. Cheng, J. Liu, and C. Dale. 2013. Understanding the Characteristicsof Internet Short Video Sharing: A YouTube-Based Measurement Study.
IEEE Transactions on Multimedia
15, 5 (Aug 2013), 1184–1194.[5] Cisco. 2018. Cisco Visual Networking Index: Forecast andTrends, 2017 − Proceedingsof the 7th ACM SIGCOMM Conference on Internet Measurement (IMC’07) . ACM, New York, NY, USA, 15–28. https://doi.org/10.1145/1298306.1298310[7] S. Kim and C. Kim. 2019. XMAS: An Efficient Mobile Adaptive Stream-ing Scheme Based on Traffic Shaping.
IEEE Transactions on Multimedia
21, 2 (Feb 2019), 442–456.[8] Christian Koch, Johannes Pfannmüller, Amr Rizk, David Hausheer,and Ralf Steinmetz. 2018. Category-aware Hierarchical Caching forVideo-on-demand Content on Youtube. In
Proceedings of the 9th ACMMultimedia Systems Conference (MMSys ’18)
IEEE Transactions on Network Science and Engineering (2018), 1–1.(2018), 1–1.