Songzhi Su | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Songzhi Su is active.

Explore More

Publication

Featured researches published by Songzhi Su.

Neurocomputing | 2015

Feature learning based on SAE–PCA network for human gesture recognition in RGBD images

Shaozi Li; Bin Yu; Wei Wu; Songzhi Su; Rongrong Ji

Abstract Coming with the emerging of depth sensors link Microsoft Kinect, human hand gesture recognition has received ever increasing research interests recently. A successful gesture recognition system has usually heavily relied on having a good feature representation of data, which is expected to be task-dependent as well as coping with the challenges and opportunities induced by depth sensor. In this paper, a feature learning approach based on sparse auto-encoder (SAE) and principle component analysis is proposed for recognizing human actions, i.e. finger-spelling or sign language, for RGB-D inputs. The proposed model of feature learning is consisted of two components: First, features are learned respectively from the RGB and depth channels, using sparse auto-encoder with convolutional neural networks. Second, the learned features from both channels is concatenated and fed into a multiple layer PCA to get the final feature. Experimental results on American sign language (ASL) dataset demonstrate that the proposed feature learning model is significantly effective, which improves the recognition rate from 75% to 99.05% and outperforms the state-of-the-art.

Signal Processing | 2013

Perspective-SIFT: An efficient tool for low-altitude remote sensing image registration

Guorong Cai; Pierre-Marc Jodoin; Shaozi Li; Yundong Wu; Songzhi Su; Zhen-Kun Huang

This paper presents an automated image registration approach that is robust to perspective distortions. State-of-the-art method affine-SIFT uses affine transform to simulate various viewpoints to increase the robustness of registration. However, affine transformation does not follow the process by which real-world images are formed. To solve this problem, we propose a perspective scale invariant feature transform (PSIFT) that uses homographic transformation to simulate perspective distortion. As for ASIFT, PSIFT is based on the scale invariant feature transform (SIFT) and has a two-resolution scheme, namely a low-resolution phase and a high-resolution phase. The low-resolution phase of PSIFT simulates several image views following a perspective transformation by varying two camera axis orientation parameters. Given those simulated images, SIFT is then used to extract features and find matches among them. In the high-resolution phase, the perspective transformations which lead the largest number of matches in the low-resolution stage are selected to generate SIFT features on the original images. Experimental results obtained on three categories of low-altitude remote sensing images and Morel-Yus dataset show that PSIFT outperforms significantly the state-of-the-art ASIFT, SIFT, Random Ferns, Harris-Affine, MSER and Hessian Affine, especially when images suffer severe perspective distortion.

Signal Processing | 2015

Sparse auto-encoder based feature learning for human body detection in depth image

Songzhi Su; Zhi-Hui Liu; Su-Ping Xu; Shaozi Li; Rongrong Ji

Human body detection in depth image is an active research topic in computer vision. But depth feature extraction is still an open problem. In this paper, a novel feature learning method based on sparse auto-encoder (SAE) is proposed for human body detection in depth image. The proposed learning based feature enables capturing the intrinsic human body structure. To further reduce the computation cost of SAE, both convolution neural network and pooling are introduced to reduce the training complexity. In addition, upon learning SAE based depth feature, we further pursuit the detector efficiency. A beyond sliding window localization strategy is proposed based on the fact that the depth values of object surface are almost the same. The proposed localization strategy first uses the histogram of depth to generate candidate detection window center, and then exploits the relationship between human body height and depth to determine the detection window size. Thus, it can avoid the time-consuming sliding window search, and further enables fast human body localization. Experiments on SZU Depth Pedestrian dataset verify the effectiveness of our proposed method. HighlightsSparse auto-encoder is used to learn depth feature for human detection.A beyond sliding window localization method based on depth value.

Multimedia Tools and Applications | 2017

Stratified pooling based deep convolutional neural networks for human action recognition

Sheng Yu; Yun Cheng; Songzhi Su; Guorong Cai; Shaozi Li

Video based human action recognition is an active and challenging topic in computer vision. Over the last few years, deep convolutional neural networks (CNN) has become the most popular method and achieved the state-of-the-art performance on several datasets, such as HMDB-51 and UCF-101. Since each video has a various number of frame-level features, how to combine these features to acquire good video-level feature becomes a challenging task. Therefore, this paper proposed a novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN). The process is mainly composed of five parts: (i) fine-tuning a pre-trained CNN on the target dataset, (ii) frame-level features extraction; (iii) the principal component analysis (PCA) method for feature dimensionality reduction; (iv) stratified pooling frame-level features to get video-level feature; and (v) SVM for multiclass classification. Finally, the experimental results conducted on HMDB-51 and UCF-101 datasets show that the proposed method outperforms the state-of-the-art.

Neurocomputing | 2014

Online MIL tracking with instance-level semi-supervised learning

Si Chen; Shaozi Li; Songzhi Su; Qi Tian; Rongrong Ji

In this paper we propose an online multiple instance boosting algorithm with instance-level semi-supervised learning, termed SemiMILBoost, to achieve robust object tracking. Our work revisits the multiple instance learning (MIL) formulation to alleviate the drifting problem in tracking, which addresses two key issues in the existing MIL based tracking-by-detection methods, i.e., the unselective treatment of instances in the positive bag during weak classifier updating and the lack of object prior knowledge in instance modeling. We tackle both issues in a principled way by using a robust SemiMILBoost algorithm, which treats instances in the positive bag as unlabeled while the ones in the negative bag as negative. To improve the discriminability of weak classifiers online, we iteratively update them with the pseudo-labels and importance of all instances in the positive bag, which are predicted by employing the instance-level semi-supervised learning technique with object prior knowledge during boosting. Experimental results demonstrate that our proposed algorithm outperforms the state-of-the-art tracking methods on several challenging video sequences.

Multimedia Tools and Applications | 2014

Logo detection with extendibility and discrimination

Kuo-Wei Li; Shu-Yuan Chen; Songzhi Su; Der-Jyh Duh; Hong-Bo Zhang; Shaozi Li

Logos are specially designed marks that identify goods, services, and organizations using distinguished characters, graphs, signals, and colors. Identifying logos can facilitate scene understanding, intelligent navigation, and object recognition. Although numerous logo recognition methods have been proposed for printed logos, a few methods have been specifically designed for logos in photos. Furthermore, most recognition methods use codebook-based approaches for the logos in photos. A codebook-based method is concerned with the generation of visual words for all the logo models. When new logos are added, the codebook reconstruction is required if effectiveness is a crucial factor. Moreover, logo detection in natural scenes is difficult because of perspective tilt and non-rigid deformation. Therefore, this study develops an extendable, but discriminating, model-based logo detection method. The proposed logo detection method is based on a support vector machine (SVM) using edge-based histograms of oriented gradient (HOGE) as features through multi-scale sliding window scanning. Thereafter, anti-distortion affine scale invariant feature transform (ASIFT) is used for logo verification with constraints on the ASIFT matching pairs and neighbors. The experimental results using the public Flickr-Logo database confirm that the proposed method has a higher retrieval and precision accuracy compared to existing model-based methods.

Multimedia Tools and Applications | 2014

Adaptive photograph retrieval method

Hong-Bo Zhang; Shang-An Li; Shu-Yuan Chen; Songzhi Su; Der-Jyh Duh; Shaozi Li

Access to electronic books, electronic journals, and web portals, which may contain graphics (drawings or diagrams) and images, is now ubiquitous. However, users may have photographs that contain graphics or images and want to access an electronic database to retrieve this information. Hence, an effective photograph retrieval method is needed. Although many content-based retrieval methods have been developed for images and graphics, few are designed to retrieve graphics and images simultaneously. Moreover, existing graphics retrieval methods use contour-based rather than pixel-based approaches. Contour-based methods, which are concerned with lines or curves, are inappropriate for images. To retrieve graphics and images simultaneously, this work applies an adaptive retrieval method. The proposed method uses histograms of oriented gradient (HOG) as pixel-based features. However, the characteristics of graphics and images differ, and this affects feature extraction and retrieval accuracy. Thus, an adaptive method is proposed that selects different HOG-based features for retrieving graphics and images. Experimental results demonstrate the proposed method has high retrieval accuracy even under noisy conditions.

Pattern Recognition Letters | 2017

Improving pedestrian detection using motion-guided filtering

Yi Wang; Sébastien Pierard; Songzhi Su; Pierre-Marc Jodoin

Abstract In this letter, we show how a simple motion-guided nonlinear filter can drastically improve the accuracy of several pedestrian detectors. More specifically, we address the problem of how to pre-filter an image so almost any pedestrian detector will see its false detection rate decrease. First, we roughly identify moving pixels by cumulating their temporal gradient into a motion history image (MHI). The MHI is then used in conjunction with a nonlinear filter to filter out background details while leaving untouched foreground moving objects. We also show how a feedback loop as well as a merging procedure between the filtered and the unfiltered frames can further improve results. We tested our method on 26 videos from 6 categories. The results show that for a given miss rate, filtering out background details reduces the false detection rate by a factor of up to 69.6 times. Our method is simple, computationally light, and can be implemented with any pedestrian detector. Code is made publicly available at: https://bitbucket.org/wany1601/pedestriandetection

Multimedia Tools and Applications | 2016

Multi-view fall detection based on spatio-temporal interest points

Songzhi Su; Sin-Sian Wu; Shu-Yuan Chen; Der-Jyh Duh; Shaozi Li

Many countries are experiencing a rapid increase in their elderly populations, increasing the demand for appropriate healthcare systems including fall-detection systems. In recent years, many fall-detection systems have been developed, although most require the use of wearable devices. Such systems function only when the subject is wearing the device. A vision-based system presents a more convenient option. However, visual features typically depend on camera view; a single, fixed camera may not properly identify falls occurring in various directions. Thus, this study presents a solution that involves using multiple cameras. The study offers two main contributions. First, in contrast to most vision-based systems that analyze silhouettes to detect falls, the present system proposes a novel feature for measuring the degree of impact shock that is easily detectable with a wearable device but more difficult with a computer vision system. In addition, the degree of impact shock is less sensitive to camera views and can be extracted more robustly than a silhouette. Second, the proposed method uses a majority-voting strategy based on multiple views to avoid performing the tedious camera calibration required by most multiple-camera approaches. Specifically, the proposed method is based on spatio-temporal interest points (STIPs). The number of local STIP clusters is designed to indicate the degree of impact shock and body vibration. Sequences of these features are concatenated into feature vectors that are then fed into a support vector machine to classify the fall event. A majority-voting strategy based on multiple views is then used for the final determination. The proposed method has been applied to a publicly available dataset to offer evidence that the proposed method outperforms existing methods based on the same data input.

visual communications and image processing | 2014

Kinship classification based on discriminative facial patches

Jie Dong; Xiang Ao; Songzhi Su; Shaozi Li

Recently there has been a large explosive growth of image data on social networks and how to use computer vision and machine learning technology to verify people relationships on these huge amount of human-centered image data remains a challenging issue. Remarkably, there have been few research attempts to analyze the possible human relationships on images, especially kin relationships. In this paper, we tackle a challenging and relatively new issue in kinship classification: determining the family that a query face image belongs to. To address this challenge, we propose a kinship classification method in three steps: (l)Discriminative patches are detected automatically in the facial landmark regions. (2) Appearance features, Histogram of Gradient (HOG), Scale-Invariant Feature Transform (SIFT) and Four-Patch Local Binary Pattern (FPLBP) are extracted from these patches respectively, and then we concatenate the features to create a high-dimensional feature vector. (3) Linear Support Vector Machine (SVM) with polynomial kernel is adopted to accomplish kinship classification task. Experimental evaluation results on Cornell Family 101 dataset demonstrate that our proposed method significantly outperforms the state-of-the-art kinship classification approaches.

Explore More