Is this you? Create Your Porfile

Pei Xu

University of Electronic Science and Technology of China

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pei Xu is active.

Explore More

Publication

Featured researches published by Pei Xu.

Engineering Applications of Artificial Intelligence | 2015

Fast crowd density estimation with convolutional neural networks

Min Fu; Pei Xu; Xudong Li; Qihe Liu; Mao Ye; Ce Zhu

As an effective way for crowd control and management, crowd density estimation is an important research topic in artificial intelligence applications. Since the existing methods are hard to satisfy the accuracy and speed requirements of engineering applications, we propose to estimate crowd density by an optimized convolutional neural network (ConvNet). The contributions are twofold: first, convolutional neural network is first introduced for crowd density estimation. The estimation speed is significantly accelerated by removing some network connections according to the observation of the existence of similar feature maps. Second, a cascade of two ConvNet classifier has been designed, which improves both of the accuracy and speed. The method is tested on three data sets: PETS_2009, a Subway image sequence and a ground truth image sequence. Experiments confirm the good performance of the method on the same data sets compared with the state of the art works.

acm multimedia | 2014

Dynamic Background Learning through Deep Auto-encoder Networks

Pei Xu; Mao Ye; Xue Li; Qihe Liu; Yi Yang; Jian Ding

Background learning is a pre-processing of motion detection which is a basis step of video analysis. For the static background, many previous works have already achieved good performance. However, the results on learning dynamic background are still much to be improved. To address this challenge, in this paper, a novel and practical method is proposed based on deep auto-encoder networks. Firstly, dynamic background images are extracted through a deep auto-encoder network (called Background Extraction Network) from video frames containing motion objects. Then, a dynamic background model is learned by another deep auto-encoder network (called Background Learning Network) using the extracted background images as the input. To be more flexible, our background model can be updated on-line to absorb more training samples. Our main contributions are 1) a cascade of two deep auto-encoder networks which can deal with the separation of dynamic background and foregrounds very efficiently; 2) a method of online learning is adopted to accelerate the training of Background Extraction Network. Compared with previous algorithms, our approach obtains the best performance over six benchmark data sets. Especially, the experiments show that our algorithm can handle large variation background very well.

Optical Engineering | 2013

Object detection using voting spaces trained by few samples

Pei Xu; Mao Ye; Xue Li; Lishen Pei; Pengwei Jiao

Abstract. A method to detect generic objects by training with a few image samples is proposed. A new feature, namely locally adaptive steering (LAS), is proposed to represent local principal gradient orientation information. A voting space is then constructed in terms of cells that represent query image coordinates and ranges of feature values at corresponding pixel positions. Cell sizes are trained in voting spaces to estimate the tolerance of object appearance at each pixel location. After that, two detection steps are adopted to locate instances of object class in a given target image. At the first step, patches of objects are recognized by densely voting in voting spaces. Then, the refined hypotheses step is carried out to accurately locate multiple instances of object class. The new approach is training the voting spaces based on a few samples of the object. Our approach is more efficient than traditional template matching approaches. Compared with the state-of-the-art approaches, our experiments confirm that the proposed method has a better performance in both efficiency and effectiveness.

international conference on multimedia and expo | 2014

Motion detection via a couple of auto-encoder networks

Pei Xu; Mao Ye; Qihe Liu; Xudong Li; Lishen Pei; Jian Ding

Motion detection is a basis step for video processing. Previous works of motion detection based on deep learning need clean foreground or background images which always do not exist in practice. To address this challenge, a novel and practical method is proposed based on auto-encoder neural networks. First, the approximate background images are obtained via an auto-encoder network (called Reconstruction Network) from video frames. Then, a background model is learned based on these images by using another auto-encoder network (called Background Network). To be more resilient, our background model can be updated on-line to absorb more training samples. Our main contributions are 1) the architecture of the couple of auto-encoder networks which can model the background very efficiently; 2) the online learning algorithm in which a method of searching the minimizing effect parameters is adopted to accelerate the training of the Reconstruction Network. Our approach improves the motion detection performance on three data sets.

international conference on image processing | 2013

Multi-class action recognition based on inverted index of action states

Lishen Pei; Mao Ye; Pei Xu; Xuezhuan Zhao; Tao Li

A fast inverted index based algorithm is introduced for multi-class action recognition. At first, we compute the shape-motion features of the automatically localized actor. Secondly, a binary state tree is built by hierarchically clustering of the extracted features, and the action states are the cluster centers. Then videos are represented as sequences of states by searching the state binary tree. With the labeled state sequences, we create the inverted index tables. During testing, the state and the state transition scores are computed by querying the inverted index tables. With the learned weight, we compute an action recognition score vector. The recognized action class is the index of the maximum score element. Our key contribution is that we propose a fast inverted index based multi-class action recognition approach. Experiments on several challenging data sets confirm the performance of this approach.

Multimedia Tools and Applications | 2015

Fast multi-class action recognition by querying inverted index tables

Lishen Pei; Mao Ye; Pei Xu; Tao Li

A fast inverted index based algorithm is proposed for multi-class action recognition. This approach represents an action as a sequence of action states. Here, the action states are cluster centers of the extracted shape-motion features. At first, we compute the shape-motion features of a tracked actor. Secondly, a state binary tree is built by hierarchically clustering the extracted features. Then the training videos are represented as sequences of action states by searching the state binary tree. Based on the labeled state sequences, we create a state inverted index table and a state transition inverted index table. During testing, after representing a new action video as a state sequence, the state and state transition scores are computed by querying the inverted index tables. With the weight trained by the validation set, we get an action class score vector. The recognized action class label is the index of the maximum component of the score vector. Our key contribution is that we propose a fast multi-class action recognition approach based on two inverted index tables. Experiments on several challenging data sets confirm the performance of this approach.

Multimedia Tools and Applications | 2014

One example based action detection in hough space

Lishen Pei; Mao Ye; Pei Xu; Xuezhuan Zhao; Guanjun Guo

Given a short action query video, to detect the same category action in a target video is a very important research topic. We propose a fast action detection method motivated by the idea of Hough Transformation. First, we extract the HOG features at the corner points from the query video. The corner points are referred to as interest points. Then, video clips are formed by sliding a window on the query video. For each T frames of a clip, in the displacement Hough space, the interest points in all of the frames are matched with the interest points in the first frame. We count the matched pairs in the cells of the Hough space to form a 2d displacement histogram. The query video is represented by a 2d displacement histogram sequence. After that, we divide the target video with motion into video cubes. These video cubes are similarly represented by displacement histogram sequences. The matrix cosine similarity is used to compute the similarities between the query video and the video cubes. This process is referred to as action matching. In the end, with the action matching results, we precisely localize the action using the locations of the matched interest points. Our key contribution is that we propose a very simple and fast algorithm that represents the actions as the displacement histogram sequences. Experiments on the challenging datasets containing both of the simple and realistic backgrounds confirm the effectiveness and efficiency of our method.

Neural Computing and Applications | 2017

Adaptive pedestrian detection by predicting classifier

Song Tang; Mao Ye; Pei Xu; Xudong Li

Generally the performance of a pedestrian detector will decrease rapidly, when it is trained on a fixed training set but applied to specific scenes. The reason is that in the training set only a few samples are useful for the specific scenes while other samples may disturb the accurate detections. Traditional methods solve this problem by transfer learning which suffer the problem of keeping source samples or artificially labeling a few samples in the detection phase. In this paper, we propose a new method to bypass these defects by predicting pedestrian classifier for each sample in the detection phase. A classifier regression model is trained in the source domain in which each sample has a proprietary classifier. In the detection phase, a pedestrian classifier is predicted for each candidate window in an image. Thus, for the samples in the target domain, the pedestrian classifiers are different. Our main contributions are: (1) a new adaptive detector without keeping source samples or labeling a few new target samples; (2) a new dimensionality reduction method for classifier vector which simultaneously ensures the performance of both reconstruction and classification; (3) a two-stage regression neural model which can handle the high-dimensional regression problem effectively. Experiments prove that our method can achieve the state-of-the-art results on two pedestrian datasets.

Pattern Recognition and Image Analysis | 2015

Fast object detection based on several samples by training voting space

Pei Xu; Mao Ye; Lishen Pei; Yumin Dou; Hongyi Chen

In this paper, we propose a fast and novel detection method based on several samples to localize objects in target images or video. Firstly, we use several samples to train a voting space which is constructed by cells at corresponding positions. Each cell is described by a Gaussian distribution whose parameters are estimated by maximum likelihood estimation method. Then, we randomly choose one sample as a query image. Patches of target image are recognized by densely voting in the trained voting space. Next, we use a mean-shift method to refine multiple instances of object class. The high performance of our approach is demonstrated on several challenging data sets in both efficiency and effectiveness.

Journal of Real-time Image Processing | 2015

Real-time multi-class object detection using two-dimensional index

Yumin Dou; Pei Xu; Mao Ye; Xue Li; Lishen Pei; Xudong Li

When there exists only one sample for each category of objects, previous approaches of training multi-class classifiers are not applicable. In this paper, we propose a new template matching method that is both robust and real-time to multi-class object detections. Firstly, object features are encoded as binary codes based on both quantized gradient intensity and quantized gradient orientation mappings. Then, a two-dimensional index table is constructed. This two-dimensional index table has advantages in effectively organizing relationships between the features from the multi-class templates and their corresponding locations in the templates. For a target image, the features are firstly encoded. Then the object is localized by voting based on the queries of features from the index table. Our experiments on two public data sets demonstrate the high efficiency of our method and the superior performance to the state-of-the-art methods.

Explore More