Longfei Zhang
Beijing Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Longfei Zhang.
Multimedia Tools and Applications | 2014
Zan Gao; Longfei Zhang; Ming-yu Chen; Alexander G. Hauptmann; Hua Zhang; Anni Cai
Data imbalance problem often exists in our real life dataset, especial for massive video dataset, however, the balanced data distribution and the same misclassification cost are assumed in traditional machine learning algorithms, thus, it will be difficult for them to accurately describe the true data distribution, and resulting in misclassification. In this paper, the data imbalance problem in semantic extraction under massive video dataset is exploited, and enhanced and hierarchical structure (called EHS) algorithm is proposed. In proposed algorithm, data sampling, filtering and model training are considered and integrated together compactly via hierarchical structure algorithm, thus, the performance of model can be improved step by step, and is robust and stability with the change of features and datasets. Experiments on TRECVID2010 Semantic Indexing demonstrate that our proposed algorithm has much more powerful performance than that of traditional machine learning algorithms, and keeps stable and robust when different kinds of features are employed. Extended experiments on TRECVID2010 Surveillance Event Detection also prove that our EHS algorithm is efficient and effective, and reaches top performance in four of seven events.
IEEE Transactions on Circuits and Systems for Video Technology | 2014
Yue Gao; Rongrong Ji; Longfei Zhang; Alexander G. Hauptmann
Tracking people and objects is a fundamental stage toward many video surveillance systems, for which various trackers have been specifically designed in the past decade. However, it comes to a consensus that there is not any specific tracker that works sufficiently well under all circumstances. Therefore, one potential solution is to deploy multiple trackers, with a tracker output fusion step to boost the overall performance. Subsequently, an intelligent fusion design, yet general and orthogonal to any specific tracker, plays a key role in successful tracking. In this paper, we propose a symbiotic tracker ensemble toward a unified tracking framework, which is based on only the output of each individual tracker, without knowing its specific mechanism. In our approach, all trackers run in parallel, without requiring any details for tracker running, which means that all trackers are treated as black boxes. The proposed symbiotic tracker ensemble framework aims at learning an optimal combination of these tracking results. Our method captures the relation among individual trackers robustly from two aspects. First, the consistency between two successive frames is calculated for each tracker. Then, the pair-wise correlation among different trackers is estimated in the new coming frame by a graph-propagation process. Experimental results on the Caremedia dataset and the Caviar dataset demonstrate the effectiveness of the proposed method, with comparisons to several state-of-the-art methods.
Neurocomputing | 2014
Ping Ji; Liujuan Cao; Xiguang Zhang; Longfei Zhang; Weimin Wu
In recent years, extensive research efforts have been dedicated to automatic news content analysis. In this paper, we propose a novel algorithm for anchorperson detection in news video sequences. In this method, the raw news videos are firstly split into shots by a four-threshold method, and the key frames are extracted from each shot. After that, the anchorperson detection is conducted from these key frames by using a clustering-based method based on a statistical distance of Pearsons correlation coefficient. To evaluate the effectiveness of the proposed method, we have conducted experiments on 10 news sequences. In these experiments, the proposed scheme achieves a recall of 0.96 and a precision of 0.97 for anchorperson detection.
Neurocomputing | 2013
Longfei Zhang; Ziyu Guan; Alexander G. Hauptmann
Automatic understanding of human activities is a huge challenge in multimedia analysis field. This challenge is especially critical in small-scale activities, such as finger motions, and activities in complex scenes. For typical camera views, both global feature and local feature analysis methods are unsuitable. To solve this problem, many studies focus on using spatio-temporal features and feature selection methods to get video representation. However, these spatio-temporal features are problematic for two reasons. First, we are not sure whether these features are meaningful foreground or noise. Second, we are unable to foresee where an activity will occur based on these features. Therefore, a biological feature selection method is needed to reorganize these spatio-temporal features and represent the video in a feature space. In this paper, we propose a graph based Co-Attention model to select more efficient features for activity analysis. Without reducing the dimensionality, our Co-Attention model considers the number of interest points. Our model is derived from correlations among individual tiny activities, whose salient regions are identified by combining an integrated top-down and bottom-up visual attention model, and a motion attention model built by spatio-temporal features instead of optical flow directly. Different from typical attention models, the Co-Attention model allows multiple regions of interest in video co-existing for further analysis. Experimental results on the KTH dataset, YouTube dataset and a new tiny activity dataset, Pump dataset which consist of visual observation data from patients operating an infusion pump, validate our activity analysis approach is more effective than state-of-the-art methods.
international conference on information system and artificial intelligence | 2016
Shuo Tang; Longfei Zhang; Jia-Li Yan; Xiang-Wei Tan; Gangyi Ding
In this paper, we propose a novel framework for multi-objects tracking on solving two kind of challenges. One is how to discriminate different targets with similar appearance, the other is distinct the single target with serious variation over time. The proposed framework extracts discriminative appearance information of different objects from historical recordings of all tracked targets by a label consistent K-SVD (LC-KSVD) dictionary learning method. We validated our proposed framework on three publicly available video sequences with some state-of-the-art approaches. The experiment results showed that our proposed method achieves competitive results with 7.7% improvement in MOTP.
pacific rim conference on multimedia | 2015
Shuo Tang; Longfei Zhang; Jiapeng Chi; Zhufan Wang; Gangyi Ding
Tracking an object in long term is still a great challenge in computer vision. Appearance modeling is one of keys to build a good tracker. Much research attention focuses on building an appearance model by employing special features and learning method, especially online learning. However, one model is not enough to describe all historical appearances of the tracking target during a long term tracking task because of view port exchanging, illuminance varying, camera switching, etc. We propose the Adaptive Multiple Appearance Model (AMAM) framework to maintain not one model but appearance model set to solve this problem. Different appearance representations of the tracking target could be employed and grouped unsupervised and modeled by Dirichlet Process Mixture Model (DPMM) automatically. And tracking result can be selected from candidate targets predicted by trackers based on those appearance models by voting and confidence map. Experimental results on multiple public datasets demonstrate the better performance compared with state-of-the-art methods.
Ksii Transactions on Internet and Information Systems | 2017
Shuo Yan; Gangyi Ding; Hongsong Li; Ningxiao Sun; Zheng Guan; Yufeng Wu; Longfei Zhang; Tianyu Huang
Audience response is an important indicator of the quality of performing arts. Psychophysiological measurements enable researchers to perceive and understand audience response by collecting their bio-signals during a live performance. However, how the audience respond and how the performance is affected by these responses are the key elements but are hard to implement. To address this issue, we designed a brain-computer interactive system called Brain-Adaptive Digital Performance (BADP) for the measurement and analysis of audience engagement level through an interactive three-dimensional virtual theater. The BADP system monitors audience engagement in real time using electroencephalography (EEG) measurement and tries to improve it by applying content-related performing cues when the engagement level decreased. In this article, we generate EEG-based engagement level and build thresholds to determine the decrease and re-engage moments. In the experiment, we simulated two types of theatre performance to provide participants a high-fidelity virtual environment using the BADP system. We also create content-related performing cues for each performance under three different conditions. The results of these evaluations show that our algorithm could accurately detect the engagement status and the performing cues have a positive impact on regaining audience engagement across different performance types. Our findings open new perspectives in audience-based theatre performance design.
international conference on information system and artificial intelligence | 2016
Shuo Tang; Longfei Zhang; Jia-Li Yan; Xiang-Wei Tan; Gangyi Ding
Tracking target in a long-term is still a big challenge in computer vision. In recent research, many researchers pay much attention on updating current appearance of tracking target to build one online appearance model. However, one appearance model is always not enough to describe historical appearance information especially for long-term tracking task. In this paper, we propose an online multiple appearances model based on Dirichlet Process Mixture Model (DPMM), which can make different appearance representations of the tracking target grouped dynamically and in an unsupervised way. Since DPMMs appealing properties are characterized by Gibbs sampling and Gibbs sampling costs too much, we proposed an online Bayesian learning algorithm instead of Gibbs sampling to reliably and efficiently learn a DPMM from scratch through sequential approximation in a streaming fashion to adapt new tracking targets. Experiments on multiple challenging benchmark public dataset demonstrate the proposed tracking algorithm performs favorably against the state-of-the-art.
chinese conference on pattern recognition | 2016
Shuo Tang; Longfei Zhang; Xiang-Wei Tan; Jia-Li Yan; Gangyi Ding
How to build a good appearance descriptor for tracking target is a basic challenge for long-term robust tracking. In recent research, many tracking methods pay much attention to build one online appearance model and updating by employing special visual features and learning methods. However, one appearance model is not enough to describe the appearance of the target with historical information for long-term tracking task. In this paper, we proposed an online adaptive multiple appearances model to improve the performance. Building appearance model sets, based on Dirichlet Process Mixture Model (DPMM), can make different appearance representations of the tracking target grouped dynamically and in an unsupervised way. Despite the DPMM’s appealing properties, it characterized by computationally intensive inference procedures which often based on Gibbs samplers. However, Gibbs samplers are not suitable in tracking because of high time cost. We proposed an online Bayesian learning algorithm to reliably and efficiently learn a DPMM from scratch through sequential approximation in a streaming fashion to adapt new tracking targets. Experiments on multiple challenging benchmark public dataset demonstrate the proposed tracking algorithm performs 22 % better against the state-of-the-art.
conference on multimedia modeling | 2014
Longfei Zhang; Shuo Tang; Shikha Singhal; Gangyi Ding
This paper addresses temporal synchronization of human actions under multiple view situation. Many researchers focused on frame by frame alignment for sync these multi-view videos, and expolited features such as interesting point trajectory or 3d human motion feature for event detecting individual. However, since background are complex and dynamic in real world, traditional image-based features are not fit for video representation. We explore the approach by using robust spatio-temporal features and self-similarity matrices to represent actions across views. Multiple sequences can be aligned their temporal patch(Sliding window) using the Dynamic Time Warping algorithm hierarchically and measured by meta-action classifiers. Two datasets including the Pump and the Olympic dataset are used as test cases. The methods are showed the effectiveness in experiment and suited general video event dataset.