Xiaoshan Yang
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoshan Yang.
computer vision and pattern recognition | 2011
Jianbing Shen; Xiaoshan Yang; Yunde Jia; Xuelong Li
In this paper, we present a novel intrinsic image recovery approach using optimization. Our approach is based on the assumption of color characteristics in a local window in natural images. Our method adopts a premise that neighboring pixels in a local window of a single image having similar intensity values should have similar reflectance values. Thus the intrinsic image decomposition is formulated by optimizing an energy function with adding a weighting constraint to the local image properties. In order to improve the intrinsic image extraction results, we specify local constrain cues by integrating the user strokes in our energy formulation, including constant-reflectance, constant-illumination and fixed-illumination brushes. Our experimental results demonstrate that our approach achieves a better recovery of intrinsic reflectance and illumination components than by previous approaches.
IEEE Transactions on Multimedia | 2015
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu
In the Web 2.0 era, a huge number of media data, such as text, image/video, and social interaction information, have been generated on the social media sites (e.g., Facebook, Google, Flickr, and YouTube). These media data can be effectively adopted for many applications (e.g., image/video annotation, image/video retrieval, and event classification) in multimedia. However, it is difficult to design an effective feature representation to describe these data because they have multi-modal property (e.g., text, image, video, and audio) and multi-domain property (e.g., Flickr, Google, and YouTube). To deal with these issues, we propose a novel cross-domain feature learning (CDFL) algorithm based on stacked denoising auto-encoders. By introducing the modal correlation constraint and the cross-domain constraint in conventional auto-encoder, our CDFL can maximize the correlations among different modalities and extract domain invariant semantic features simultaneously. To evaluate our CDFL algorithm , we apply it to three important applications: sentiment classification, spam filtering, and event classification. Comprehensive evaluations demonstrate the encouraging performance of the proposed approach.
IEEE Transactions on Systems, Man, and Cybernetics | 2013
Jianbing Shen; Xiaoshan Yang; Xuelong Li; Yunde Jia
In this paper, we present a novel high-quality intrinsic image recovery approach using optimization and user scribbles. Our approach is based on the assumption of color characteristics in a local window in natural images. Our method adopts a premise that neighboring pixels in a local window having similar intensity values should have similar reflectance values. Thus, the intrinsic image decomposition is formulated by minimizing an energy function with the addition of a weighting constraint to the local image properties. In order to improve the intrinsic image decomposition results, we further specify local constraint cues by integrating the user strokes in our energy formulation, including constant-reflectance, constant-illumination, and fixed-illumination brushes. Our experimental results demonstrate that the proposed approach achieves a better recovery result of intrinsic reflectance and illumination components than the previous approaches.
IEEE Transactions on Multimedia | 2015
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu; M. Shamim Hossain
Vision-based event analysis is extremely difficult due to the various concepts (object, action, and scene) contained in videos. Though visual concept-based event analysis has achieved significant progress, it has two disadvantages: visual concept is defined manually, and has only one corresponding classifier in traditional methods. To deal with these issues, we propose a novel automatic visual concept learning algorithm for social event understanding in videos. First, instead of defining visual concept manually, we propose an effective automatic concept mining algorithm with the help of Wikipedia, N-gram Web services, and Flickr. Then, based on the learned visual concept, we propose a novel boosting concept learning algorithm to iteratively learn multiple classifiers for each concept to enhance its representative discriminability. The extensive experimental evaluations on the collected dataset well demonstrate the effectiveness of the proposed algorithm for social event understanding.
IEEE Transactions on Image Processing | 2017
Junyu Gao; Tianzhu Zhang; Xiaoshan Yang; Changsheng Xu
Most existing tracking methods are direct trackers, which directly exploit foreground or/and background information for object appearance modeling and decide whether an image patch is target object or not. As a result, these trackers cannot perform well when target appearance changes heavily and becomes different from its model. To deal with this issue, we propose a novel relative tracker, which can effectively exploit the relative relationship among image patches from both foreground and background for object appearance modeling. Different from direct trackers, the proposed relative tracker is robust to localize target object by use of the best image patch with the highest relative score to the target appearance model. To model relative relationship among large-scale image patch pairs, we propose a novel and effective deep relative learning algorithm through the convolutional neural network. We test the proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that our method consistently outperforms the state-of-the-art trackers due to the powerful capacity of the proposed deep relative model.
IEEE Transactions on Multimedia | 2016
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu; Shuicheng Yan; M. Shamim Hossain; Ahmed Ghoneim
Relative attribute (RA) learning aims to learn the ranking function describing the relative strength of the attribute. Most of current learning approaches learn a linear ranking function for each attribute by use of the hand-crafted visual features. Different from the existing study, in this paper, we propose a novel deep relative attributes (DRA) algorithm to learn visual features and the effective nonlinear ranking function to describe the RA of image pairs in a unified framework. Here, visual features and the ranking function are learned jointly, and they can benefit each other. The proposed DRA model is comprised of five convolutional neural layers, five fully connected layers, and a relative loss function which contains the contrastive constraint and the similar constraint corresponding to the ordered image pairs and the unordered image pairs, respectively. To train the DRA model effectively, we make use of the transferred knowledge from the large scale visual recognition on ImageNet [1] to the RA learning task. We evaluate the proposed DRA model on three widely used datasets. Extensive experimental results demonstrate that the proposed DRA model consistently and significantly outperforms the state-of-the-art RA learning methods. On the public OSR, PubFig, and Shoes datasets, compared with the previous RA learning results [2], the average ranking accuracies have been significantly improved by about 8%, 9%, and 14%, respectively.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2015
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu; Ming-Hsuan Yang
Conventional learning algorithm assumes that the training data and test data share a common distribution. However, this assumption will greatly hinder the practical application of the learned model for cross-domain data analysis in multimedia. To deal with this issue, transfer learning based technology should be adopted. As a typical version of transfer learning, domain adaption has been extensively studied recently due to its theoretical value and practical interest. In this article, we propose a boosted multifeature learning (BMFL) approach to iteratively learn multiple representations within a boosting procedure for unsupervised domain adaption. The proposed BMFL method has a number of properties. (1) It reuses all instances with different weights assigned by the previous boosting iteration and avoids discarding labeled instances as in conventional methods. (2) It models the instance weight distribution effectively by considering the classification error and the domain similarity, which facilitates learning new feature representation to correct the previously misclassified instances. (3) It learns multiple different feature representations to effectively bridge the source and target domains. We evaluate the BMFL by comparing its performance on three applications: image classification, sentiment classification and spam filtering. Extensive experimental results demonstrate that the proposed BMFL algorithm performs favorably against state-of-the-art domain adaption methods.
ACM Transactions on Multimedia Computing, Communications, and Applications | 2016
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu
Content-based video understanding is extremely difficult due to the semantic gap between low-level vision signals and the various semantic concepts (object, action, and scene) in videos. Though feature extraction from videos has achieved significant progress, most of the previous methods rely only on low-level features, such as the appearance and motion features. Recently, visual-feature extraction has been improved significantly with machine-learning algorithms, especially deep learning. However, there is still not enough work focusing on extracting semantic features from videos directly. The goal of this article is to adopt unlabeled videos with the help of text descriptions to learn an embedding function, which can be used to extract more effective semantic features from videos when only a few labeled samples are available for video recognition. To achieve this goal, we propose a novel embedding convolutional neural network (ECNN). We evaluate our algorithm by comparing its performance on three challenging benchmarks with several popular state-of-the-art methods. Extensive experimental results show that the proposed ECNN consistently and significantly outperforms the existing methods.
international conference on internet multimedia computing and service | 2013
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu
The Bag-of-Words (BOW) based methods are widely used in image classification. However, huge number of visual information is omitted inevitably in the quantization step of the BOW. Recently, NBNN and its improved methods like Local NBNN were proposed to solve this problem. Nevertheless, these methods do not perform better than the state-of-the-art BOW based methods. In this paper, based on the advantages of BOW and Local NBNN, we introduce a novel locality discriminative coding (LDC) method. We convert each low level local feature, such as SIFT, into code vector using the Local Feature-to-Class distance other than by k-means quantization. Extensive experimental results on 4 challenging benchmark datasets show that our LDC method outperforms 6 state-of-the-art image classification methods (3 based on NBNN, 3 based on BOW).
Multimedia Systems | 2015
Xiaoshan Yang; Tianzhu Zhang; Changsheng Xu
The bag-of-words (BOW) based methods are widely used in image classification. However, huge number of visual information is omitted inevitably in the quantization step of the BOW. Recently, NBNN and its improved methods like Local NBNN were proposed to solve this problem. Nevertheless, these methods do not perform better than the state-of-the-art BOW based methods. In this paper, based on the advantages of BOW and Local NBNN, we introduce a novel locality discriminative coding (LDC) method. We convert each low level local feature, such as SIFT, into code vector using the Local Feature-to-Class distance other than by k-means quantization. After coding, sum-pooling combined with SPM is used to construct a single feature representation vector for each image. Extensive experimental results on several challenging benchmark datasets show that our LDC method outperforms six state-of-the-art image classification methods.