Proceedings of the Brazilian Symposium on Multimedia and the Web | 2021

Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity

Abstract

The Internet s popularization has increased the amount of content produced and consumed on the Web. To take advantage of this new market, major content producers such as Netflix and Amazon Prime have emerged focusing on video streaming services. However, despite the large number and diversity of videos made available by these content providers, few of them attract most users attention. For example, in the data explored in this paper, only 6% of the most popular videos are responsible for 85% of the total views. Finding out in advance which videos will be popular is not trivial, specially because of the large amount of influencing variables. Nevertheless, a tool with this ability would be of great value to help dimensioning network infrastructure and to properly recommend new content to users. In this work, we propose two approaches to obtaining features to classify the popularity of a video before it is published. The first one builds upon predictive attributes defined by feature engineering. The second leverages word embeddings from the descriptions and titles of the videos. We experiment with the proposed approaches on a set of videos from GloboPlay, the largest provider of video streaming services in Latin America. A combination of both engineered features and the embeddings using Random Forest machine learning algorithm reached the best result, with an accuracy of 87%.

Volume None

Proceedings of the Brazilian Symposium on Multimedia and the Web | 2021

Extracting Textual Features from Video Streaming Services Publications to Predict their Popularity

Abstract

Volume None

Pages None

DOI 10.1145/3470482.3479624

Language English

Journal Proceedings of the Brazilian Symposium on Multimedia and the Web

Full Text