Xiaoshuai Sun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoshuai Sun is active.

Explore More

Publication

Featured researches published by Xiaoshuai Sun.

IEEE Transactions on Image Processing | 2012

Task-Dependent Visual-Codebook Compression

Rongrong Ji; Hongxun Yao; Wei Liu; Xiaoshuai Sun; Qi Tian

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.

acm multimedia | 2014

Exploring Principles-of-Art Features For Image Emotion Recognition

Sicheng Zhao; Yue Gao; Xiaolei Jiang; Hongxun Yao; Tat-Seng Chua; Xiaoshuai Sun

Emotions can be evoked in humans by images. Most previous works on image emotion analysis mainly used the elements-of-art-based low-level visual features. However, these features are vulnerable and not invariant to the different arrangements of elements. In this paper, we investigate the concept of principles-of-art and its influence on image emotions. Principles-of-art-based emotion features (PAEF) are extracted to classify and score image emotions for understanding the relationship between artistic principles and emotions. PAEF are the unified combination of representation features derived from different principles, including balance, emphasis, harmony, variety, gradation, and movement. Experiments on the International Affective Picture System (IAPS), a set of artistic photography and a set of peer rated abstract paintings, demonstrate the superiority of PAEF for affective image classification and regression (with about 5% improvement on classification accuracy and 0.2 decrease in mean squared error), as compared to the state-of-the-art approaches. We then utilize PAEF to analyze the emotions of master paintings, with promising results.

acm multimedia | 2009

Photo assessment based on computational visual attention model

Xiaoshuai Sun; Hongxun Yao; Rongrong Ji; Shaohui Liu

It is difficult to be satisfied for automatic photo assessment using only low level visual features such as brightness, lighting, hue, contrast, color distribution and so on. Instead of using low level visual features, we present a novel computational visual attention model to assess photos. Firstly, a face-sensitive saliency map analysis is deployed to estimate attention distribution. Then, a Rate of Focused Attention (RFA) measurement is proposed to quantify photo quality. By integrating top-down supervision into the visual attention model, we further achieve personalized photo assessment to take user preference into quality evaluation, which can be extended into object or semantic oriented photo assessment scenarios. Experiments on personal photo albums with comparison to ground-truth user evaluations demonstrate the effeteness of the proposed method.

Neurocomputing | 2015

Strategy for dynamic 3D depth data matching towards robust action retrieval

Sicheng Zhao; Lujun Chen; Hongxun Yao; Yanhao Zhang; Xiaoshuai Sun

Abstract 3D depth data, especially dynamic 3D depth data, offer several advantages over traditional intensity videos for expressing objects׳ actions, such as being useful in low light levels, resolving the silhouette ambiguity of actions, and being color and texture invariant. With the wide popularity of somatosensory equipment (Kinect for example), more and more dynamic 3D depth data are shared on the Internet, which results in an urgent need to retrieve these data efficiently and effectively. In this paper, we propose a generalized strategy for dynamic 3D depth data matching and apply this strategy in action retrieval task. Firstly, an improved 3D shape context descriptor (3DSCD) is proposed to extract features of each static depth frame. Then we employ dynamic time warping (DTW) to measure the temporal similarity between two 3D dynamic depth sequences. Experimental results on our collected dataset consisting of 170 dynamic 3D depth video clips show that the proposed 3DSCD has a rich descriptive power on depth data and that the method using 3DSCD and DTW achieves high matching accuracy. Finally, to address the matching efficiency problem, we utilize the bag of word (BoW) model to quantize the 3DSCD of each static depth frame into visual word packages. So the original feature matching problem is simplified into a two-histogram matching problem. The results demonstrate the matching efficiency of our proposed method, while still maintaining high matching accuracy.

computer vision and pattern recognition | 2010

Towards semantic embedding in visual vocabulary

Rongrong Ji; Hongxun Yao; Xiaoshuai Sun; Bineng Zhong; Wen Gao

Visual vocabulary serves as a fundamental component in many computer vision tasks, such as object recognition, visual search, and scene modeling. While state-of-the-art approaches build visual vocabulary based solely on visual statistics of local image patches, the correlative image labels are left unexploited in generating visual words. In this work, we present a semantic embedding framework to integrate semantic information from Flickr labels for supervised vocabulary construction. Our main contribution is a Hidden Markov Random Field modeling to supervise feature space quantization, with specialized considerations to label correlations: Local visual features are modeled as an Observed Field, which follows visual metrics to partition feature space. Semantic labels are modeled as a Hidden Field, which imposes generative supervision to the Observed Field with WordNet-based correlation constraints as Gibbs distribution. By simplifying the Markov property in the Hidden Field, both unsupervised and supervised (label independent) vocabularies can be derived from our framework. We validate our performances in two challenging computer vision tasks with comparisons to state-of-the-arts: (1) Large-scale image search on a Flickr 60,000 database; (2) Object recognition on the PASCAL VOC database.

computer vision and pattern recognition | 2012

What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency

Xiaoshuai Sun; Hongxun Yao; Rongrong Ji

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This new observations inspired us to model saccadic behavior and visual saliency based on Super Gaussian Component (SGC) analysis. The model sequentially obtains SGC using projection pursuit, and generates eye-movements by selecting the location with maximum SGC response. Beside human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on psychological patterns and human eye fixation benchmarks. These results also show promising potentials of statistical approaches for human behavior research.

international conference on image processing | 2015

Predicting discrete probability distribution of image emotions

Sicheng Zhao; Hongxun Yao; Xiaolei Jiang; Xiaoshuai Sun

Most existing works on affective image classification tried to assign a dominant emotion category to an image. However, this is often insufficient, as the emotions that are evoked in viewers by an image are highly subjective and different. In this paper, we propose to predict the probability distribution of categorical image emotions. Firstly we extract commonly used features of different levels for each image. Then we formulize the emotion distribution prediction as a shared sparse leaning problem, which is optimized by iteratively reweighted least squares. Besides, we introduce three baseline algorithms. Experiments are carried out on a dataset of peer rated abstract paintings and the results demonstrate the superiority of our proposed method, as compared to some state-of-the-art approaches.

conference on multimedia modeling | 2013

Flexible Presentation of Videos Based on Affective Content Analysis

Sicheng Zhao; Hongxun Yao; Xiaoshuai Sun; Xiaolei Jiang; Pengfei Xu

The explosion of multimedia contents has resulted in a great demand of video presentation. While most previous works focused on presenting certain type of videos or summarizing videos by event detection, we propose a novel method to present general videos of different genres based on affective content analysis. We first extract rich audio-visual affective features and select discriminative ones. Then we map effective features into corresponding affective states in an improved categorical emotion space using hidden conditional random fields (HCRFs). Finally we draw affective curves which tell the types and intensities of emotions. With the curves and related affective visualization techniques, we select the most affective shots and concatenate them to construct affective video presentation with a flexible and changeable type and length. Experiments on representative video database from the web demonstrate the effectiveness of the proposed method.

international conference on multimedia and expo | 2008

Directional correlation analysis of local Haar binary pattern for text detection

Rongrong Ji; Pengfei Xu; Hongxun Yao; Zhen Zhang; Xiaoshuai Sun; Tianqiang Liu

Two main restrictions exist in state-of-the-art text detection algorithms: 1. Illumination variance; 2. Text-background contrast variance. This paper presents a robust text characterization approach based on local Haar binary pattern (LHBP) to address these problems. Based on LHBP, a coarse-to-fine detection framework is presented to precisely locate text lines in scene images. Firstly, threshold-restricted local binary pattern is extracted from high-frequency coefficients of pyramid Haar wavelet. It preserves and uniforms inconsistent text-background contrasts while filtering gradual illumination variations. Subsequently, we propose a directional correlation analysis (DCA) approach to filter non-directional LHBP regions for locating candidate text regions. Finally, using LHBP histogram, an SVM-based post-classification is presented to refine detection results. Experimental results on ICDAR 03 demonstrate the effectiveness and robustness of our proposed method.

international conference on image processing | 2010

Saliency detection based on short-term sparse representation

Xiaoshuai Sun; Hongxun Yao; Rongrong Ji; Pengfei Xu; Xianming Liu; Shaohui Liu

Representation and measurement are two important issues for saliency models. Different with previous works that learnt sparse features from large scale natural statistics, we propose to learn features from short-term statistics of single images. For saliency measurement, we define background firing rate (BFR) for each sparse feature, and then we propose to use feature activation rate (FAR) to measure the bottom-up visual saliency. The proposed FAR measure is biological plausible and easy to compute, also with satisfied performance. Experiments on human eye fixations and psychological patterns demonstrate the effectiveness and robustness of our proposed method.

Explore More