Featured Researches

Multimedia

A Study on the Characteristics of Douyin Short Videos and Implications for Edge Caching

Douyin, internationally known as TikTok, has become one of the most successful short-video platforms. To maintain its popularity, Douyin has to provide better Quality of Experience (QoE) to its growing user base. Understanding the characteristics of Douyin videos is thus critical to its service improvement and system design. In this paper, we present an initial study on the fundamental characteristics of Douyin videos based on a dataset of over 260 thousand short videos collected across three months. The characteristics of Douyin videos are found to be significantly different from traditional online videos, ranging from video bitrate, size, to popularity. In particular, the distributions of the bitrate and size of videos follow Weibull distribution. We further observe that the most popular Douyin videos follow Zifp's law on video popularity, but the rest of the videos do not. We also investigate the correlation between popularity metrics used for Douyin videos. It is found that the correlation between the number of views and the number of likes are strong, while other correlations are relatively low. Finally, by using a case study, we demonstrate that the above findings can provide important guidance on designing an efficient edge caching system.

Read more
Multimedia

A Survey on 360-Degree Video: Coding, Quality of Experience and Streaming

The commercialization of Virtual Reality (VR) headsets has made immersive and 360-degree video streaming the subject of intense interest in the industry and research communities. While the basic principles of video streaming are the same, immersive video presents a set of specific challenges that need to be addressed. In this survey, we present the latest developments in the relevant literature on four of the most important ones: (i) omnidirectional video coding and compression, (ii) subjective and objective Quality of Experience (QoE) and the factors that can affect it, (iii) saliency measurement and Field of View (FoV) prediction, and (iv) the adaptive streaming of immersive 360-degree videos. The final objective of the survey is to provide an overview of the research on all the elements of an immersive video streaming system, giving the reader an understanding of their interplay and performance.

Read more
Multimedia

A Taxonomy and Dataset for 360° Videos

In this paper, we propose a taxonomy for 360° videos that categorizes videos based on moving objects and camera motion. We gathered and produced 28 videos based on the taxonomy, and recorded viewport traces from 60 participants watching the videos. In addition to the viewport traces, we provide the viewers' feedback on their experience watching the videos, and we also analyze viewport patterns on each category.

Read more
Multimedia

A Time-Frequency Perspective on Audio Watermarking

Existing audio watermarking methods usually treat the host audio signals of a function of time or frequency individually, while considering them in the joint time-frequency (TF) domain has received less attention. This paper proposes an audio watermarking framework from the perspective of TF analysis. The proposed framework treats the host audio signal in the 2-dimensional (2D) TF plane, and selects a series of patches within the 2D TF image. These patches correspond to the TF clusters with minimum averaged energy, and are used to form the feature vectors for watermark embedding. Classical spread spectrum embedding schemes are incorporated in the framework. The feature patches that carry the watermarks only occupy a few TF regions of the host audio signal, thus leading to improved imperceptibility property. In addition, since the feature patches contain a neighborhood area of TF representation of audio samples, the correlations among the samples within a single patch could be exploited for improved robustness against a series of processing attacks. Extensive experiments are carried out to illustrate the effectiveness of the proposed system, as compared to its counterpart systems. The aim of this work is to shed some light on the notion of audio watermarking in TF feature domain, which may potentially lead us to more robust watermarking solutions against malicious attacks.

Read more
Multimedia

A User-experience Driven SSIM-Aware Adaptation Approach for DASH Video Streaming

Dynamic Adaptive Streaming over HTTP (DASH) is a video streaming technique largely used. One key point is the adaptation mechanism which resides at the client's side. This mechanism impacts greatly on the overall Quality of Experience (QoE) of the video streaming. In this paper, we propose a new adaptation algorithm for DASH, namely SSIM Based Adaptation (SBA). This mechanism is user-experience driven: it uses the Structural Similarity Index Measurement (SSIM) as main video perceptual quality indicator; moreover, the adaptation is based on a joint consideration of SSIM indicator and the physical resources (buffer occupancy, bandwidth) in order to minimize the buffer starvation (rebuffering) and video quality instability, as well as to maximize the overall video quality (through SSIM). To evaluate the performance of our proposal, we carried out trace-driven emulation with real traffic traces (captured in real mobile network). Comparisons with some representative algorithms (BBA, FESTIVE, OSMF) through major QoE metrics show that our adaptation algorithm SBA achieves an efficient adaptation minimizing both the rebuffering and instability, whereas the displayed video is maintained at a high level of bitrate.

Read more
Multimedia

A blind robust watermarking method based on Arnold Cat map and amplified pseudo-noise strings with weak correlation

In this paper, a robust and blind watermarking method is proposed, which is highly resistant to the common image watermarking attacks, such as noises, compression, and image quality enhancement processing. In this method, Arnold Cat map is used as a pre-processing on the host image, which increases the security and imperceptibility of embedding watermark bits with a strong gain factor. Moreover, two pseudo-noise strings with weak correlation are used as the symbol of each 0 or 1 bit of the watermark, which increases the accuracy in detecting the state of watermark bits at extraction phase in comparison to using two random pseudo-noise strings. In this method, to increase the robustness and further imperceptibility of the embedding, the Arnold Cat mapped image is subjected to non-overlapping blocking, and then the high frequency coefficients of the approximation sub-band of the FDCuT transform are used as the embedding location for each block. Comparison of the proposed method with recent robust methods under the same experimental conditions indicates the superiority of the proposed method.

Read more
Multimedia

A block-based inter-band predictor using multilayer propagation neural network for hyperspectral image compression

In this paper, a block-based inter-band predictor (BIP) with multilayer propagation neural network model (MLPNN) is presented by a completely new framework. This predictor can combine with diversity entropy coding methods. Hyperspectral (HS) images are composed by a series high similarity spectral bands. Our assumption is to use trained MLPNN predict the succeeding bands based on current band information. The purpose is to explore whether BIP-MLPNN can provide better image predictive results with high efficiency. The algorithm also changed from the traditional compression methods encoding images pixel by pixel, the compression process only encodes the weights and the biases vectors of BIP-MLPNN which require few bits to transfer. The decoder will reconstruct a band by using the same structure of the network at the encoder side. The BIP-MLPNN decoder does not need to be trained as the weights and biases have already been transmitted. We can easily reconstruct the succeeding bands by using the BIP-MLPNN decoder. The experimental results indicate that BIP-MLPNN predictor outperforms the CCSDS-123 HS image coding standard. Due to a good approximation of the target band, the proposed method outperforms the CCSDS-123 by more than 2.0dB PSNR image quality in the predicted bands. Moreover, the proposed method provides high quality image e.g., 30 to 40dB PSNR at very low bit rate (less than 0.1 bpppb) and outperforms the existing methods e.g., JPEG, 3DSPECK, 3DSPIHT and in terms of rate-distortion performance.

Read more
Multimedia

A multi-level approach with visual information for encrypted H.265/HEVC videos

High-efficiency video coding (HEVC) encryption has been proposed to encrypt syntax elements for the purpose of video encryption. To achieve high video security, to the best of our knowledge, almost all of the existing HEVC encryption algorithms mainly encrypt the whole video, such that the user without permissions cannot obtain any viewable information. However, these encryption algorithms cannot meet the needs of customers who need part of the information but not the full information in the video. In many cases, such as professional paid videos or video meetings, users would like to observe some visible information in the encrypted video of the original video to satisfy their requirements in daily life. Aiming at this demand, this paper proposes a multi-level encryption scheme that is composed of lightweight encryption, medium encryption and heavyweight encryption, where each encryption level can obtain a different amount of visual information. It is found that both encrypting the luma intraprediction model (IPM) and scrambling the syntax element of the DCT coefficient sign can achieve the performance of a distorted video in which there is still residual visual information, while encrypting both of them can implement the intensity of encryption and one cannot gain any visual information. The experimental results meet our expectations appropriately, indicating that there is a different amount of visual information in each encryption level. Meanwhile, users can flexibly choose the encryption level according to their various requirements.

Read more
Multimedia

A multimodal lossless coding method for skeletons in videos

Nowadays, skeleton information in videos plays an important role in human-centric video analysis but effective coding such massive skeleton information has never been addressed in previous work. In this paper, we make the first attempt to solve this problem by proposing a multimodal skeleton coding tool containing three different coding schemes, namely, spatial differential-coding scheme, motionvector-based differential-coding scheme and inter prediction scheme, thus utilizing both spatial and temporal redundancy to losslessly compress skeleton data. More importantly, these schemes are switched properly for different types of skeletons in video frames, hence achieving further improvement of compression rate. Experimental results show that our approach leads to 74.4% and 54.7% size reduction on our surveillance sequences and overall test sequences respectively, which demonstrates the effectiveness of our skeleton coding tool.

Read more
Multimedia

A multimodal movie review corpus for fine-grained opinion mining

In this paper, we introduce a set of opinion annotations for the POM movie review dataset, composed of 1000 videos. The annotation campaign is motivated by the development of a hierarchical opinion prediction framework allowing one to predict the different components of the opinions (e.g. polarity and aspect) and to identify the corresponding textual spans. The resulting annotations have been gathered at two granularity levels: a coarse one (opinionated span) and a finer one (span of opinion components). We introduce specific categories in order to make the annotation of opinions easier for movie reviews. For example, some categories allow the discovery of user recommendation and preference in movie reviews. We provide a quantitative analysis of the annotations and report the inter-annotator agreement under the different levels of granularity. We provide thus the first set of ground-truth annotations which can be used for the task of fine-grained multimodal opinion prediction. We provide an analysis of the data gathered through an inter-annotator study and show that a linear structured predictor learns meaningful features even for the prediction of scarce labels. Both the annotations and the baseline system are made publicly available. this https URL

Read more

Ready to get started?

Join us today