Liyuan Xing
Norwegian University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liyuan Xing.
IEEE Transactions on Multimedia | 2007
Guangyu Zhu; Qingming Huang; Changsheng Xu; Liyuan Xing; Wen Gao; Hongxun Yao
The majority of existing work on sports video analysis concentrates on highlight extraction. Little work focuses on the important issue as how the extracted highlights should be organized. In this paper, we present a multimodal approach to organize the highlights extracted from racket sports video grounded on human behavior analysis using a nonlinear affective ranking model. Two research challenges of highlight ranking are addressed, namely affective feature extraction and ranking model construction. The basic principle of affective feature extraction in our work is to extract sensitive features which can stimulate users emotion. Since the users pay most attention to player behavior and audience response in racket sport highlights, we extract affective features from player behavior including action and trajectory, and game-specific audio keywords. We propose a novel motion analysis method to recognize the player actions. We employ support vector regression to construct the nonlinear highlight ranking model from affective features. A new subjective evaluation criterion is proposed to guide the model construction. To evaluate the performance of the proposed approaches, we have tested them on more than ten-hour broadcast tennis and badminton videos. The experimental results demonstrate that our action recognition approach significantly outperforms the existing appearance-based method. Moreover, our user study shows that the affective highlight ranking approach is effective.
IEEE Transactions on Multimedia | 2012
Liyuan Xing; Junyong You; Touradj Ebrahimi; Andrew Perkis
Stereoscopic three-dimensional (3-D) services do not always prevail when compared with their two-dimensional (2-D) counterparts, though the former can provide more immersive experience with the help of binocular depth. Various specific 3-D artefacts might cause discomfort and severely degrade the Quality of Experience (QoE). In this paper, we analyze one of the most annoying artefacts in the visualization stage of stereoscopic imaging, namely, crosstalk, by conducting extensive subjective quality tests. A statistical analysis of the subjective scores reveals that both scene content and camera baseline have significant impacts on crosstalk perception, in addition to the crosstalk level itself. Based on the observed visual variations during changes in significant factors, three perceptual attributes of crosstalk are summarized as the sensorial results of the human visual system (HVS). These are shadow degree, separation distance, and spatial position of crosstalk. They are classified into two categories: 2-D and 3-D perceptual attributes, which can be described by a Structural SIMilarity (SSIM) map and a filtered depth map, respectively. An objective quality metric for predicting crosstalk perception is then proposed by combining the two maps. The experimental results demonstrate that the proposed metric has a high correlation (over 88%) when compared with subjective quality scores in a wide variety of situations.
Computer Vision and Image Understanding | 2009
Chunxi Liu; Qingming Huang; Shuqiang Jiang; Liyuan Xing; Qixiang Ye; Wen Gao
While most existing sports video research focuses on detecting event from soccer and baseball etc., little work has been contributed to flexible content summarization on racquet sports video, e.g. tennis, table tennis etc. By taking advantages of the periodicity of video shot content and audio keywords in the racquet sports video, we propose a novel flexible video content summarization framework. Our approach combines the structure event detection method with the highlight ranking algorithm. Firstly, unsupervised shot clustering and supervised audio classification are performed to obtain the visual and audio mid-level patterns respectively. Then, a temporal voting scheme for structure event detection is proposed by utilizing the correspondence between audio and video content. Finally, by using the affective features extracted from the detected events, a linear highlight model is adopted to rank the detected events in terms of their exciting degrees. Experimental results show that the proposed approach is effective.
international conference on image processing | 2010
Liyuan Xing; Junyong You; Touradj Ebrahimi; Andrew Perkis
Compared to metrics proposed to assess the quality of two-dimensional (2D) images, there are very few metrics devoted to quality assessment of stereoscopic presentations. Crosstalk is one of the most annoying distortions in the visualization stage of stereoscopic imaging technology. This paper proposes a perceptual quality metric which takes characteristics of stereoscopic images into account for predicting quality levels of crosstalk perception in stereoscopic images, based on an understanding of three main factors, crosstalk level, camera baseline and scene content. The experimental results demonstrate that the proposed metric has Pearson correlation of 87.7% when compared to the ground truth results from the subjective experiments on the crosstalk perception, which is much better than the traditional 2D metrics without integrating 3D depth information.
Archive | 2010
Junyong You; Gangyi Jiang; Liyuan Xing; Andrew Perkis
Three-dimensional television (3DTV) technology is becoming increasingly popular, as it can provide high quality and immersive experience to end users. Stereoscopic imaging is a technique capable of recoding 3D visual information or creating the illusion of depth. Most 3D compression schemes are developed for stereoscopic images including applying traditional two-dimensional (2D) compression techniques, and considering theories of binocular suppression as well. The compressed stereoscopic content is delivered to customers through communication channels. However, both compression and transmission errors may degrade the quality of stereoscopic images. Subjective quality assessment is the most accurate way to evaluate the quality of visual presentations in either 2D or 3D modality, even though it is time-consuming. This chapter will offer an introduction to related issues in perceptual quality assessment for stereoscopic images. Our results are a subjective quality experiment on stereoscopic images and focusing on four typical distortion types including Gaussian blurring, JPEG compression, JPEG2000 compression, and white noise. Furthermore, although many 2D image quality metrics have been proposed that work well on 2D images, developing quality metrics for 3D visual content is almost an unexplored issue. Therefore, this chapter will further introduce some well-known 2D image quality metrics and investigate their capabilities in stereoscopic image quality assessments. As an important attribute of stereoscopic images, disparity refers to the difference in image location of an object seen by the left and right eyes, which has a significant impact on the stereoscopic image quality assessment. Thus, a study on an integration of the disparity information in quality assessment is presented. The experimental results demonstrated that better performance can be achieved if the disparity information and original images are combined appropriately in the stereoscopic image quality assessment.
international symposium on intelligent signal processing and communication systems | 2010
Liyuan Xing; Junyong You; Touradj Ebrahimi; Andrew Perkis
Stereo quality of experience (QoE) can be affected by many factors in three-dimensional (3D) media. We pay our attention to three main factors, namely scene content, camera baseline, and screen size, which were found to be the most significant influencing factors from our previous subjective tests. In order to estimate the QoE on stereoscopic presentations, this paper proposes a perceptual quality metric by characterizing these three factors using image disparity plus additional weights on screen size. The experimental results demonstrate that the proposed metric can achieve more than 87% correlation with subjective scores, which is better than the traditional two dimensional (2D) metrics without integrating 3D characteristics, with the ground truth of QoE from our subjective tests and a publicly available stereoscopic quality database.
multimedia signal processing | 2010
Liyuan Xing; Junyong You; Touradj Ebrahimi; Andrew Perkis
Most quality models for stereoscopic presentations are dedicated to measuring quality degradation caused by compression artefacts. However, non-compression distortions induced during acquisition and presentation usually have significant influence on 3D viewing experience. In this paper, we propose an objective metric for viewing experience assessment by taking camera baseline and binocular distortion crosstalk into consideration. In particular, the proposed metric is based on our previous work on both subjective evaluation and objective assessment of crosstalk perception. Results on a publicly available stereoscopic quality database demonstrate that the proposed metric can achieve more than 87% correlation with subjective assessment of viewing experience.
international symposium on multimedia | 2011
Jie Xu; Liyuan Xing; Andrew Perkis; Yuming Jiang
For research on quality of experience (QoE), mean opinion scores (MOS) are widely chosen as the results of subjective tests and the ground-truth reference for further research on objective quality modeling. Furthermore, the results of objective quality modeling are used for QoE management subsequently. Therefore, the performance of QoE management process actually depends heavily on MOS. However, the rationality of MOS for QoE management is not yet technically proven in the literature. In this paper, we first prove that subject homogeneity is implicitly assumed for obtaining MOS by modeling the arithmetic averaging process from a systematic viewpoint. However, we point out that actually subjects exhibit variability in terms of quality assessment. Then we elaborate that this mismatch may results in failures if we conduct QoE management based on MOS. Finally we propose a utility-based averaging method (uMOS) which improves the performance of QoE management.
Proceedings of SPIE | 2011
Liyuan Xing; Junyong You; Touradj Ebrahimi; Andrew Perkis
The stereoscopic 3D industry has fallen short of achieving acceptable Quality of Experience (QoE) because of various technical limitations, such as excessive disparity, accommodation-convergence mismatch. This study investigates the effect of scene content, camera baseline, screen size and viewing location on stereoscopic QoE in a holistic approach. 240 typical test configurations are taken into account, in which a wide range of disparity constructed from the shooting conditions (scene content, camera baseline, sensor resolution/screen size) was selected from datasets, making the constructed disparities locate in different ranges of maximal disparity supported by viewing environment (viewing location). Second, an extensive subjective test is conducted using a single stimulus methodology, in which 15 samples at each viewing location were obtained. Finally, a statistical analysis is performed and the results reveal that scene content, camera baseline, as well as the interactions between screen size, scene content and camera baseline, have significant impact on QoE in stereoscopic images, while other factors, especially viewing location involved, have almost no significant impact. The generated Mean Opinion Scores (MOS) and the statistical results can be used to design stereoscopic quality metrics and validate their performance.
international conference on multimedia and expo | 2012
Junyong You; Liyuan Xing; Andrew Perkis; Touradj Ebrahimi
Contrast sensitivity is an important characteristic of the human visual system (HVS), which is widely used in image and video signal processing. For visual quality assessment, static spatio-temporal frequency based contrast sensitivity function (CSF) is often used, while the contrast sensitivity can also be affected by smooth pursuit of eyes when tracking attentive regions in the field of view. This paper proposes to tune CSF based on an attention map derived from a visual attention model. The tuned CSF formulated by spatial frequency, temporal velocity, and visual attention map is used to filter video signals in order to construct a quality metric according to the difference of the filtered signals between a reference video and its distorted version. Experimental results demonstrate that the proposed attention tuned spatio-velocity CSF outperforms the traditional spatio-temporal CSF in evaluating the perceived video quality.