Zhang
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhang.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Zhang Zhang; Dacheng Tao
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience suggest that the temporal slowness principle is a general learning principle in visual perception. In this paper, we introduce the SFA framework to the problem of human action recognition by incorporating the discriminative information with SFA learning and considering the spatial relationship of body parts. In particular, we consider four kinds of SFA learning strategies, including the original unsupervised SFA (U-SFA), the supervised SFA (S-SFA), the discriminative SFA (D-SFA), and the spatial discriminative SFA (SD--SFA), to extract slow feature functions from a large amount of training cuboids which are obtained by random sampling in motion boundaries. Afterward, to represent action sequences, the squared first order temporal derivatives are accumulated over all transformed cuboids into one feature vector, which is termed the Accumulated Squared Derivative (ASD) feature. The ASD feature encodes the statistical distribution of slow features in an action sequence. Finally, a linear support vector machine (SVM) is trained to classify actions represented by ASD features. We conduct extensive experiments, including two sets of control experiments, two sets of large scale experiments on the KTH and Weizmann databases, and two sets of experiments on the CASIA and UT-interaction databases, to demonstrate the effectiveness of SFA for human action recognition. Experimental results suggest that the SFA-based approach (1) is able to extract useful motion patterns and improves the recognition performance, (2) requires less intermediate processing steps but achieves comparable or even better performance, and (3) has good potential to recognize complex multiperson activities.
international conference on pattern recognition | 2006
Zhang Zhang; Kaiqi Huang; Tieniu Tan
This paper compares different similarity measures used for trajectory clustering in outdoor surveillance scenes. Six similarity measures are presented and the performance is evaluated by correct clustering rate (CCR) and time cost (TC). The experimental results demonstrate that in outdoor surveillance scenes, the simpler PCA+Euclidean distance is competent for the clustering task even in case of noise, as more complex similarity measures such as DTW, LCSS are not efficient due to their high computational cost
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Zhang Zhang; Tieniu Tan; Kaiqi Huang
For a grammar-based approach to the recognition of visual events, there are two major limitations that prevent it from real application. One is that the event rules are predefined by domain experts, which means huge manual cost. The other is that the commonly used grammar can only handle sequential relations between subevents, which is inadequate to recognize more complex events involving parallel subevents. To solve these problems, we propose an extended grammar approach to modeling and recognizing complex visual events. First, motion trajectories as original features are transformed into a set of basic motion patterns of a single moving object, namely, primitives (terminals) in the grammar system. Then, a Minimum Description Length (MDL) based rule induction algorithm is performed to discover the hidden temporal structures in primitive stream, where Stochastic Context-Free Grammar (SCFG) is extended by Allens temporal logic to model the complex temporal relations between subevents. Finally, a Multithread Parsing (MTP) algorithm is adopted to recognize interesting complex events in a given primitive stream, where a Viterbi-like error recovery strategy is also proposed to handle large-scale errors, e.g., insertion and deletion errors. Extensive experiments, including gymnastic exercises, traffic light events, and multi-agent interactions, have been executed to validate the effectiveness of the proposed approach.
computer vision and pattern recognition | 2007
Zhang Zhang; Kaiqi Huang; Tieniu Tan; Liangsheng Wang
In this paper, a generic rule induction framework based on trajectory series analysis is proposed to learn the event rules. First the trajectories acquired by a tracking system are mapped into a set of primitive events that represent some basic motion patterns of moving object. Then a minimum description length (MDL) principle based grammar induction algorithm is adopted to infer the meaningful rules from the primitive event series. Compared with previous grammar rule based work on event recognition where the rules are all defined manually, our work aims to learn the event rules automatically. Experiments in a traffic crossroad have demonstrated the effectiveness of our methods. Shown in the experimental results, most of the grammar rules obtained by our algorithm are consistent with the actual traffic events in the crossroad. Furthermore the traffic lights rule in the crossroad can also be leaned correctly with the help of eliminating the irrelevant trajectories.
european conference on computer vision | 2008
Zhang Zhang; Kaiqi Huang; Tieniu Tan
This paper presents a probabilistic grammar approach to the recognition of complex events in videos. Firstly, based on the original motion features, a rule induction algorithm is adopted to learn the event rules. Then, a multi-thread parsing (MTP) algorithm is adopted to recognize the complex events involving parallel temporal relation in sub-events, whereas the commonly used parser can only handle the sequential relation. Additionally, a Viterbi-like error recovery strategy is embedded in the parsing process to correct the large time scale errors, such as insertion and deletion errors. Extensive experiments including indoor gymnastic exercises and outdoor traffic events are performed. As supported by experimental results, the MTP algorithm can effectively recognize the complex events due to the strong discriminative representation and the error recovery strategy.
asian conference on computer vision | 2006
Zhang Zhang; Kaiqi Huang; Tieniu Tan
Stochastic grammar has been used in many video analysis and event recognition applications as an efficient model to represent large-scale video activity. However, in previous works, due to the limitation on representing parallel temporal relations, traditional stochastic grammar cannot be used to model complex multi-agent activity including parallel temporal relations between sub-activities (such as “during” relation). In this paper, we extend the traditional grammar by introducing Temporal Relation Events (TRE) to solve the problem. The corresponding grammar parser appending complex temporal inference is also proposed. A system that can recognize two hands’ cooperative action in a “telephone calling” activity is built to demonstrate the effectiveness of our methods. In the experiment, a simple method to model the explicit state duration probability distribution in HMM detector is also proposed for accurate primitive events detection.
The Computer Journal | 2012
Zhang Zhang; Jun Cheng; Jun Li; Wei Bian; Dacheng Tao
In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component-or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series. Based on the original SBF, an incremental feature selection algorithm is conducted to form a compact and discriminative feature representation. Finally, a multi-class support vector machine is trained to classify the test time series. Extensive experiments on different time series datasets, including one synthetic control dataset, two sign language datasets and one gait dynamics dataset, have been performed to evaluate the proposed SBF method. Compared with other state-of-the-art methods, our approach achieves superior classification performance, which clearly validates the advantages of the proposed method.
computer vision and pattern recognition | 2016
Zhang Zhang; Kaiqi Huang; Tieniu Tan; Peipei Yang; Jun Li
For spectral embedding/clustering, it is still an open problem on how to construct an relation graph to reflect the intrinsic structures in data. In this paper, we proposed an approach, named Relation Discovery based Slow Feature Analysis (ReD-SFA), for feature learning and graph construction simultaneously. Given an initial graph with only a few nearest but most reliable pairwise relations, new reliable relations are discovered by an assumption of reliability preservation, i.e., the reliable relations will preserve their reliabilities in the learnt projection subspace. We formulate the idea as a cross entropy (CE) minimization problem to reduce the discrepancy between two Bernoulli distributions parameterized by the updated distances and the existing relation graph respectively. Furthermore, to overcome the imbalanced distribution of samples, a Boosting-like strategy is proposed to balance the discovered relations over all clusters. To evaluate the proposed method, extensive experiments are performed with various trajectory clustering tasks, including motion segmentation, time series clustering and crowd detection. The results demonstrate that ReDSFA can discover reliable intra-cluster relations with high precision, and competitive clustering performance can be achieved in comparison with state-of-the-art.
Archive | 2016
Zhang Zhang; Kaiqi Huang
Occlusion poses as a critical challenge in computer vision for a long time. Camera array based synthetic aperture photography has been regarded as a promising way to address the problem of occluded object imaging. However, the application of this technique is limited by the building cost and the immobility of the camera array system. In order to build a more practical synthetic aperture photography system, in this paper, a novel multiple moving camera based collaborative synthetic aperture photography is proposed. The main characteristics of our work include: (1) to the best of our knowledge, this is the first multiple moving camera based collaborative synthetic aperture photography system; (2) by building a sparse 3D map of the occluded scene using one camera, the information from the subsequent cameras can be incrementally utilized to estimate the warping induced by the focal plane; (3) the compatibility of different types of cameras, such as the hand-held action cameras or the quadrotor on-board cameras, shows the generality of the proposed framework. Extensive experiments have demonstrated the see-throughocclusion performance of the proposed approach in different scenarios.
Archive | 2010
Tieniu Tan; Zhang Zhang; Kaiqi Huang; Liangsheng Wang