Featured Researches

Multimedia

A Generalized Rate-Distortion- λ Model Based HEVC Rate Control Algorithm

The High Efficiency Video Coding (HEVC/H.265) standard doubles the compression efficiency of the widely used H.264/AVC standard. For practical applications, rate control (RC) algorithms for HEVC need to be developed. Based on the R-Q, R- ρ or R- λ models, rate control algorithms aim at encoding a video clip/segment to a target bit rate accurately with high video quality after compression. Among the various models used by HEVC rate control algorithms, the R- λ model performs the best in both coding efficiency and rate control accuracy. However, compared with encoding with a fixed quantization parameter (QP), even the best rate control algorithm [1] still under-performs when comparing the video quality achieved at identical average bit rates. In this paper, we propose a novel generalized rate-distortion- λ (R-D- λ ) model for the relationship between rate (R), distortion (D) and the Lagrangian multiplier ( λ ) in rate-distortion (RD) optimized encoding. In addition to the well designed hierarchical initialization and coefficient update scheme, a new model based rate allocation scheme composed of amortization, smooth window and consistency control is proposed for a better rate allocation. Experimental results implementing the proposed algorithm in the HEVC reference software HM-16.9 show that the proposed rate control algorithm is able to achieve an average of BDBR saving of 6.09%, 3.15% and 4.03% for random access (RA), low delay P (LDP) and low delay B (LDB) configurations respectively as compared with the R- λ model based RC algorithm [1] implemented in HM. The proposed algorithm also outperforms the state-of-the-art algorithms, while rate control accuracy and encoding speed are hardly impacted.

Read more
Multimedia

A Graph-based Ranking Approach to Extract Key-frames for Static Video Summarization

Video abstraction has become one of the efficient approaches to grasp the content of a video without seeing it entirely. Key frame-based static video summarization falls under this category. In this paper, we propose a graph-based approach which summarizes the video with best user satisfaction. We treated each video frame as a node of the graph and assigned a rank to each node by our proposed VidRank algorithm. We developed three different models of VidRank algorithm and performed a comparative study on those models. A comprehensive evaluation of 50 videos from open video database using objective and semi-objective measures indicates the superiority of our static video summary generation method.

Read more
Multimedia

A Ground-Truth Data Set and a Classification Algorithm for Eye Movements in 360-degree Videos

The segmentation of a gaze trace into its constituent eye movements has been actively researched since the early days of eye tracking. As we move towards more naturalistic viewing conditions, the segmentation becomes even more challenging and convoluted as more complex patterns emerge. The definitions and the well-established methods that were developed for monitor-based eye tracking experiments are often not directly applicable to unrestrained set-ups such as eye tracking in wearable contexts or with head-mounted displays. The main contributions of this work to the eye movement research for 360-degree content are threefold: First, we collect, partially annotate, and make publicly available a new eye tracking data set, which consists of 13 participants viewing 15 video clips that are recorded in 360-degree. Second, we propose a new two-stage pipeline for ground truth annotation of the traditional fixations, saccades, smooth pursuits, as well as (optokinetic) nystagmus, vestibulo-ocular reflex, and pursuit of moving objects performed exclusively via the movement of the head. A flexible user interface for this pipeline is implemented and made freely accessible for use or modification. Lastly, we develop and test a simple proof-of-concept algorithm for automatic classification of all the eye movement types in our data set based on their operational definitions that were used for manual annotation. The data set and the source code for both the annotation tool and the algorithm are publicly available at this https URL.

Read more
Multimedia

A Group Variational Transformation Neural Network for Fractional Interpolation of Video Coding

Motion compensation is an important technology in video coding to remove the temporal redundancy between coded video frames. In motion compensation, fractional interpolation is used to obtain more reference blocks at sub-pixel level. Existing video coding standards commonly use fixed interpolation filters for fractional interpolation, which are not efficient enough to handle diverse video signals well. In this paper, we design a group variational transformation convolutional neural network (GVTCNN) to improve the fractional interpolation performance of the luma component in motion compensation. GVTCNN infers samples at different sub-pixel positions from the input integer-position sample. It first extracts a shared feature map from the integer-position sample to infer various sub-pixel position samples. Then a group variational transformation technique is used to transform a group of copied shared feature maps to samples at different sub-pixel positions. Experimental results have identified the interpolation efficiency of our GVTCNN. Compared with the interpolation method of High Efficiency Video Coding, our method achieves 1.9% bit saving on average and up to 5.6% bit saving under low-delay P configuration.

Read more
Multimedia

A Human-Computer Duet System for Music Performance

Virtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In this paper, we firstly create a virtual violinist, who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. In our system, the virtual musician's behavior is generated based on the given music audio alone, and such a system results in a low-cost, efficient and scalable way to produce human and virtual musicians' co-performance. The proposed system has been validated in public concerts. Objective quality assessment approaches and possible ways to systematically improve the system are also discussed.

Read more
Multimedia

A Hybrid Control Scheme for Adaptive Live Streaming

The live streaming is more challenging than on-demand streaming, because the low latency is also a strong requirement in addition to the trade-off between video quality and jitters in playback. To balance several inherently conflicting performance metrics and improve the overall quality of experience (QoE), many adaptation schemes have been proposed. Bitrate adaptation is one of the major solutions for video streaming under time-varying network conditions, which works even better combining with some latency control methods, such as adaptive playback rate control and frame dropping. However, it still remains a challenging problem to design an algorithm to combine these adaptation schemes together. To tackle this problem, we propose a hybrid control scheme for adaptive live streaming, namely HYSA, based on heuristic playback rate control, latency-constrained bitrate control and QoE-oriented adaptive frame dropping. The proposed scheme utilizes Kaufman's Adaptive Moving Average (KAMA) to predict segment bitrates for better rate decisions. Extensive simulations demonstrate that HYSA outperforms most of the existing adaptation schemes on overall QoE.

Read more
Multimedia

A JND-based Video Quality Assessment Model and Its Application

Based on the Just-Noticeable-Difference (JND) criterion, a subjective video quality assessment (VQA) dataset, called the VideoSet, was constructed recently. In this work, we propose a JND-based VQA model using a probabilistic framework to analyze and clean collected subjective test data. While most traditional VQA models focus on content variability, our proposed VQA model takes both subject and content variabilities into account. The model parameters used to describe subject and content variabilities are jointly optimized by solving a maximum likelihood estimation (MLE) problem. As an application, the new subjective VQA model is used to filter out unreliable video quality scores collected in the VideoSet. Experiments are conducted to demonstrate the effectiveness of the proposed model.

Read more
Multimedia

A Knowledge-Driven Quality-of-Experience Model for Adaptive Streaming Videos

The fundamental conflict between the enormous space of adaptive streaming videos and the limited capacity for subjective experiment casts significant challenges to objective Quality-of-Experience (QoE) prediction. Existing objective QoE models exhibit complex functional form, failing to generalize well in diverse streaming environments. In this study, we propose an objective QoE model namely knowledge-driven streaming quality index (KSQI) to integrate prior knowledge on the human visual system and human annotated data in a principled way. By analyzing the subjective characteristics towards streaming videos from a corpus of subjective studies, we show that a family of QoE functions lies in a convex set. Using a variant of projected gradient descent, we optimize the objective QoE model over a database of training videos. The proposed KSQI demonstrates strong generalizability to diverse streaming environments, evident by state-of-the-art performance on four publicly available benchmark datasets.

Read more
Multimedia

A Machine Learning Approach to Optimal Inverse Discrete Cosine Transform (IDCT) Design

The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coefficients. This is particularly obvious when the quality factor of JPEG compressed images is small. To address this problem, we first use a set of training images to learn the compound effect of forward DCT, quantization and dequantization in cascade. Then, a new IDCT kernel is learned to reverse the effect of such a pipeline. Experiments are conducted to demonstrate that the advantage of the new method, which has a gain of 0.11-0.30dB over the standard JPEG over a wide range of quality factors.

Read more
Multimedia

A Modified Fourier-Mellin Approach for Source Device Identification on Stabilized Videos

To decide whether a digital video has been captured by a given device, multimedia forensic tools usually exploit characteristic noise traces left by the camera sensor on the acquired frames. This analysis requires that the noise pattern characterizing the camera and the noise pattern extracted from video frames under analysis are geometrically aligned. However, in many practical scenarios this does not occur, thus a re-alignment or synchronization has to be performed. Current solutions often require time consuming search of the realignment transformation parameters. In this paper, we propose to overcome this limitation by searching scaling and rotation parameters in the frequency domain. The proposed algorithm tested on real videos from a well-known state-of-the-art dataset shows promising results.

Read more

Ready to get started?

Join us today