Shouxun Lin
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shouxun Lin.
IEEE Transactions on Circuits and Systems for Video Technology | 2006
Wu Yuan; Shouxun Lin; Yongdong Zhang; Wen Yuan; Haiyong Luo
For the rate control of H. 264/AVC, one of the most important things is to get the statistics of the current frame accurately. To achieve this, a novel adaptive coding characteristics prediction scheme is presented to improve the accuracy of R-D modeling, by exploiting spatio-temporal correlations. With the proposed prediction scheme, we present a novel rate function and a linear distortion model, and then deduce a simple close-form solution to the problem of optimum bit allocation, just in a TMN-8-alike way. Extensive experiments show that improvements with gains up to 0.92dB per frame over JVT-G012, the current standardized rate control scheme, are achieved by the proposed scheme for a variety of test sequences with less demanding bandwidth.
acm multimedia | 2004
Gao Chen; Yongdong Zhang; Shouxun Lin; Feng Dai
In this paper, an efficient block size mode selection algorithm for the variable-sizes block-matching (VSBM) in the MPEG-2 to H.264 transcoding is presented. Depending on leveraging the available motion information carried by the MPEG-2 bit-streams, the proposed algorithm is used to determine which one of the 16x16, 16x8, 8x16, and 8x8 block size modes should be used for each macroblock (MB). The simulation results show that the performance of the proposed algorithm is close to that of a cascaded pixel-domain transcoder (CPDT) when all the seven block size modes are enabled and the exhaustively full search method is used to determine the best block size modes. The whole transcoding time can be efficiently reduced by 22% on the average while the bit rate is slightly increased (2.9%).
annual acis international conference on computer and information science | 2008
Yan Song; Anan Liu; Lin Pang; Shouxun Lin; Yongdong Zhang; Sheng Tang
Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions.
IEEE Transactions on Circuits and Systems for Video Technology | 2011
Yan Song; Yan-Tao Zheng; Sheng Tang; Xiangdong Zhou; Yongdong Zhang; Shouxun Lin; Tat-Seng Chua
Realistic human action recognition in videos has been a useful yet challenging task. Video shots of same actions may present huge intra-class variations in terms of visual appearance, kinetic patterns, video shooting, and editing styles. Heterogeneous feature representations of videos pose another challenge on how to effectively handle the redundancy, complementariness and disagreement in these features. This paper proposes a localized multiple kernel learning (L-MKL) algorithm to tackle the issues above. L-MKL integrates the localized classifier ensemble learning and multiple kernel learning in a unified framework to leverage the strengths of both. The basis of L-MKL is to build multiple kernel classifiers on diverse features at subspace localities of heterogeneous representations. L-MKL integrates the discriminability of complementary features locally and enables localized MKL classifiers to deliver better performance in its own region of expertise. Specifically, L-MKL develops a locality gating model to partition the input space of heterogeneous representations to a set of localities of simpler data structure. Each locality then learns its localized optimal combination of Mercer kernels of heterogeneous features. Finally, the gating model coordinates the localized multiple kernel classifiers globally to perform action recognition. Experiments on two datasets show that the proposed approach delivers promising performance.
international conference on multimedia and expo | 2009
Ke Gao; Shouxun Lin; Yongdong Zhang; Sheng Tang; Dongming Zhang
Logo detection is important for brand advertising and surveillance applications. The central issues of this technology are fast localization and accurate matching. Based on key traits analysis of common logos, this paper presents a two-stage detection scheme based on spatialspectral saliency (SSS) and partial spatial context (PSC). SSS speeds up logo location and avoid the impact of cluttered background. PSC filters false matching using spatial consistency of local invariant points. The integration of SSS and PSC result in faster localization and increased accuracy. Experiments on a dataset of nearly 10,000 web images containing several popular logo types are presented. The results indicate that our method is applicable and precise for different logo detection scenarios.
Signal Processing | 2012
Jie Xu; Jianwei Ma; Dongming Zhang; Yongdong Zhang; Shouxun Lin
Total variation (TV) minimization algorithms are often used to recover sparse signals or images in the compressive sensing (CS). But the use of TV solvers often suffers from undesirable staircase effect. To reduce this effect, this paper presents an improved TV minimization method for block-based CS by intra-prediction. The new method conducts intra-prediction block by block in the CS reconstruction process and generates a residual for the image block being decoded in the CS measurement domain. The gradient of the residual is sparser than that of the image itself, which can lead to better reconstruction quality in CS by TV regularization. The staircase effect can also be eliminated due to effective reconstruction of the residual. Furthermore, to suppress blocking artifacts caused by intra-prediction, an efficient adaptive in-loop deblocking filter was designed for post-processing during the CS reconstruction process. Experiments show competitive performances of the proposed hybrid method in comparison with state-of-the-art TV models for CS with respect to peak signal-to-noise ratio and the subjective visual quality.
annual acis international conference on computer and information science | 2008
Ke Gao; Shouxun Lin; Yongdong Zhang; Sheng Tang; Huamin Ren
Effective feature extraction is a fundamental component of content-based image retrieval. Scale Invariant Feature Transform (SIFT) has been proven to be the most robust local invariant feature descriptor. However, SIFT algorithm generates hundreds of thousands of keypoints per image, and most of them comes from background. This has seriously affected the application of SIFT in real-time image retrieval. This paper addresses this problem and proposes a novel method to filter the SIFT keypoints using attention model. Based on visual attention analysis, all of the keypoints in an image are ranked with their attention saliency, and only the most distinctive keypoints will be reserved. Then we use Bag of words to efficiently index these features. Experiments demonstrate that the attention model based SIFT keypoints filtration algorithm provides significant benefits both in retrieval accuracy and matching speed.
international conference on image processing | 2011
Lei Huang; Tian Xia; Yongdong Zhang; Shouxun Lin
Human skin detection in images is desirable in many practical applications, e.g., adult-content filtering. However, existing methods are mainly pixel-based and ignore that human skin is region-based. In this paper, we introduce a successful region detector, i.e., MSER, into the skin detection by regarding the skin region as the maximally stable extremal region (MSER). We extend the original MSER to both color and texture analysis to reduce the skinlike regions1. Furthermore, to be adaptive to the dynamic illumination and chrominance, face detection is used to customize the skin color model to each image. The proposed method has achieved promising performance over our dataset, which is a challenging set with a great part of hard images. Our True Positive Rate is 81.2% under False Positive Rate 8.2%, which outperforms all of eight state-of-the-art algorithms.
international conference on multimedia and expo | 2008
Dan Zhao; Xiangdong Wang; Yueliang Qian; Qun Liu; Shouxun Lin
Automatic detection of commercials in digital multimedia material is a challenging task with many applications. This paper presents a novel approach to fast commercial detection based on audio retrieval. It is based on the idea of segmenting energy envelope of audio into units, using only audio signal for matching on a commercial database. Fast searching and matching can be performed with high accuracy, by searching and by novel similarity function based on units. Experimental results show that 96.8% recall rate and 98.7% precision rate can be achieved under 0.125 real-time.
acm multimedia | 2007
Huanbo Luan; Shi-Yong Neo; Hai-Kiat Goh; Yongdong Zhang; Shouxun Lin; Tat-Seng Chua
Existing video research incorporates the use of relevance feedback based on user-dependent interpretations to improve the retrieval results. In this paper, we segregate the process of relevance feedback into 2 distinct facets: (a) recall-directed feedback; and (b) precision-directed feedback. The recall-directed facet employs general features such as text and high level features (HLFs) to maximize efficiency and recall during feedback, making it very suitable for large corpuses. The precision-directed facet on the other hand uses many other multimodal features in an active learning environment for improved accuracy. Combined with a performance-based adaptive sampling strategy, this process continuously re-ranks a subset of instances as the user annotates. Experiments done using TRECVID 2006 dataset show that our approach is efficient and effective.