Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Junsong Yuan is active.

Publication


Featured researches published by Junsong Yuan.


computer vision and pattern recognition | 2012

Mining actionlet ensemble for action recognition with depth cameras

Jiang Wang; Zicheng Liu; Ying Wu; Junsong Yuan

Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.


computer vision and pattern recognition | 2011

Sparse reconstruction cost for abnormal event detection

Yang Cong; Junsong Yuan; Ji Liu

We propose to detect abnormal events via a sparse reconstruction over the normal bases. Given an over-complete normal basis set (e.g., an image sequence or a collection of local spatio-temporal patches), we introduce the sparse reconstruction cost (SRC) over the normal dictionary to measure the normalness of the testing sample. To condense the size of the dictionary, a novel dictionary selection method is designed with sparsity consistency constraint. By introducing the prior weight of each basis during sparse reconstruction, the proposed SRC is more robust compared to other outlier detection criteria. Our method provides a unified solution to detect both local abnormal events (LAE) and global abnormal events (GAE). We further extend it to support online abnormal event detection by updating the dictionary incrementally. Experiments on three benchmark datasets and the comparison to the state-of-the-art methods validate the advantages of our algorithm.


IEEE Transactions on Multimedia | 2013

Robust Part-Based Hand Gesture Recognition Using Kinect Sensor

Zhou Ren; Junsong Yuan; Jingjing Meng; Zhengyou Zhang

The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Movers Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.


acm multimedia | 2011

Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera

Zhou Ren; Junsong Yuan; Zhengyou Zhang

The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g. in human body tracking and body gesture recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust hand gesture recognition system using the Kinect sensor. To handle the noisy hand shape obtained from the Kinect sensor, we propose a novel distance metric for hand dissimilarity measure, called Finger-Earth Movers Distance (FEMD). As it only matches fingers while not the whole hand shape, it can better distinguish hand gestures of slight differences. The extensive experiments demonstrate the accuracy, efficiency, and robustness of our hand gesture recognition system.


computer vision and pattern recognition | 2009

Discriminative subvolume search for efficient action detection

Junsong Yuan; Zicheng Liu; Ying Wu

Actions are spatio-temporal patterns which can be characterized by collections of spatio-temporal invariant features. Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns. This paper addresses two critical issues in pattern matching-based action detection: (1) efficiency of pattern search in 3D videos and (2) tolerance of intra-pattern variations of actions. Our contributions are two-fold. First, we propose a discriminative pattern matching called naive-Bayes based mutual information maximization (NBMIM) for multi-class action categorization. It improves the state-of-the-art results on standard KTH dataset. Second, a novel search algorithm is proposed to locate the optimal subvolume in the 3D video space for efficient action detection. Our method is purely data-driven and does not rely on object detection, tracking or background subtraction. It can well handle the intra-pattern variations of actions such as scale and speed variations, and is insensitive to dynamic and clutter backgrounds and even partial occlusions. The experiments on versatile datasets including KTH and CMU action datasets demonstrate the effectiveness and efficiency of our method.


acm multimedia | 2011

Robust hand gesture recognition with kinect sensor

Zhou Ren; Jingjing Meng; Junsong Yuan; Zhengyou Zhang

Hand gesture based Human-Computer-Interaction (HCI) is one of the most natural and intuitive ways to communicate between people and machines, since it closely mimics how human interact with each other. In this demo, we present a hand gesture recognition system with Kinect sensor, which operates robustly in uncontrolled environments and is insensitive to hand variations and distortions. Our system consists of two major modules, namely, hand detection and gesture recognition. Different from traditional vision-based hand gesture recognition methods that use color-markers for hand detection, our system uses both the depth and color information from Kinect sensor to detect the hand shape, which ensures the robustness in cluttered environments. Besides, to guarantee its robustness to input variations or the distortions caused by the low resolution of Kinect sensor, we apply a novel shape distance metric called Finger-Earth Movers Distance (FEMD) for hand gesture recognition. Consequently, our system operates accurately and efficiently. In this demo, we demonstrate the performance of our system in two real-life applications, arithmetic computation and rock-paper-scissors game.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2014

Learning Actionlet Ensemble for 3D Human Action Recognition

Jiang Wang; Zicheng Liu; Ying Wu; Junsong Yuan

Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations, and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this paper, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state-of-the-art algorithms.


IEEE Transactions on Multimedia | 2012

Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection

Yang Cong; Junsong Yuan; Jiebo Luo

The rapid growth of consumer videos requires an effective and efficient content summarization method to provide a user-friendly way to manage and browse the huge amount of video data. Compared with most previous methods that focus on sports and news videos, the summarization of personal videos is more challenging because of its unconstrained content and the lack of any pre-imposed video structures. We formulate video summarization as a novel dictionary selection problem using sparsity consistency, where a dictionary of key frames is selected such that the original video can be best reconstructed from this representative dictionary. An efficient global optimization algorithm is introduced to solve the dictionary selection model with the convergence rates as O(1/K2) (where K is the iteration counter), in contrast to traditional sub-gradient descent methods of O(1/√K). Our method provides a scalable solution for both key frame extraction and video skim generation, because one can select an arbitrary number of key frames to represent the original videos. Experiments on a human labeled benchmark dataset and comparisons to the state-of-the-art methods demonstrate the advantages of our algorithm.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Discriminative Video Pattern Search for Efficient Action Detection

Junsong Yuan; Zicheng Liu; Ying Wu

Actions are spatiotemporal patterns. Similar to the sliding window-based object detection, action detection finds the reoccurrences of such spatiotemporal patterns through pattern matching, by handling cluttered and dynamic backgrounds and other types of action variations. We address two critical issues in pattern matching-based action detection: 1) the intrapattern variations in actions, and 2) the computational efficiency in performing action pattern search in cluttered scenes. First, we propose a discriminative pattern matching criterion for action classification, called naive Bayes mutual information maximization (NBMIM). Each action is characterized by a collection of spatiotemporal invariant features and we match it with an action class by measuring the mutual information between them. Based on this matching criterion, action detection is to localize a subvolume in the volumetric video space that has the maximum mutual information toward a specific action class. A novel spatiotemporal branch-and-bound (STBB) search algorithm is designed to efficiently find the optimal solution. Our proposed action detection method does not rely on the results of human detection, tracking, or background subtraction. It can handle action variations such as performing speed and style variations as well as scale changes well. It is also insensitive to dynamic and cluttered backgrounds and even to partial occlusions. The cross-data set experiments on action detection, including KTH, CMU action data sets, and another new MSR action data set, demonstrate the effectiveness and efficiency of the proposed multiclass multiple-instance action detection method.


Pattern Recognition | 2013

Abnormal event detection in crowded scenes using sparse representation

Yang Cong; Junsong Yuan; Ji Liu

We propose to detect abnormal events via a sparse reconstruction over the normal bases. Given a collection of normal training examples, e.g., an image sequence or a collection of local spatio-temporal patches, we propose the sparse reconstruction cost (SRC) over the normal dictionary to measure the normalness of the testing sample. By introducing the prior weight of each basis during sparse reconstruction, the proposed SRC is more robust compared to other outlier detection criteria. To condense the over-completed normal bases into a compact dictionary, a novel dictionary selection method with group sparsity constraint is designed, which can be solved by standard convex optimization. Observing that the group sparsity also implies a low rank structure, we reformulate the problem using matrix decomposition, which can handle large scale training samples by reducing the memory requirement at each iteration from O(k^2) to O(k) where k is the number of samples. We use the columnwise coordinate descent to solve the matrix decomposition represented formulation, which empirically leads to a similar solution to the group sparsity formulation. By designing different types of spatio-temporal basis, our method can detect both local and global abnormal events. Meanwhile, as it does not rely on object detection and tracking, it can be applied to crowded video scenes. By updating the dictionary incrementally, our method can be easily extended to online event detection. Experiments on three benchmark datasets and the comparison to the state-of-the-art methods validate the advantages of our method.

Collaboration


Dive into the Junsong Yuan's collaboration.

Top Co-Authors

Avatar

Ying Wu

Northwestern University

View shared research outputs
Top Co-Authors

Avatar

Jingjing Meng

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel Thalmann

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Gang Yu

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Hui Liang

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Yap-Peng Tan

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Jianfeng Ren

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Xudong Jiang

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Gangqiang Zhao

Nanyang Technological University

View shared research outputs
Researchain Logo
Decentralizing Knowledge