Weilong Yang
Simon Fraser University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Weilong Yang.
computer vision and pattern recognition | 2010
Weilong Yang; Yang Wang; Greg Mori
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose information for action recognition. Our experimental results demonstrate that by inferring the latent poses, we can improve the final action recognition results.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Tian Lan; Yang Wang; Weilong Yang; Stephen N. Robinovitch; Greg Mori
In this paper, we go beyond recognizing the actions of individuals and focus on group activities. This is motivated from the observation that human actions are rarely performed in isolation; the contextual information of what other people in the scene are doing provides a useful cue for understanding high-level activities. We propose a novel framework for recognizing group activities which jointly captures the group activity, the individual person actions, and the interactions among them. Two types of contextual information, group-person interaction and person-person interaction, are explored in a latent variable framework. In particular, we propose three different approaches to model the person-person interaction. One approach is to explore the structures of person-person interaction. Differently from most of the previous latent structured models, which assume a predefined structure for the hidden layer, e.g., a tree structure, we treat the structure of the hidden layer as a latent variable and implicitly infer it during learning and inference. The second approach explores person-person interaction in the feature level. We introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behavior of other people nearby. The third approach combines the above two. Our experimental results demonstrate the benefit of using contextual information for disambiguating group activities.
computer vision and pattern recognition | 2011
Weilong Yang; George Toderici
We consider the problem of content-based automated tag learning. In particular, we address semantic variations (sub-tags) of the tag. Each video in the training set is assumed to be associated with a sub-tag label, and we treat this sub-tag label as latent information. A latent learning framework based on LogitBoost is proposed, which jointly considers both the tag label and the latent sub-tag label. The latent sub-tag information is exploited in our framework to assist the learning of our end goal, i.e., tag prediction. We use the cowatch information to initialize the learning process. In experiments, we show that the proposed method achieves significantly better results over baselines on a large-scale testing video set which contains about 50 million YouTube videos.
international conference on computer vision | 2009
Weilong Yang; Yang Wang; Greg Mori
Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this paper, we consider the problem of human action recognition from a single clip per action. Each clip contains at most 25 frames. Using a patch based motion descriptor and matching scheme, we can achieve promising results on three different action datasets with a single clip as the template. Our results are comparable to previously published results using much larger training sets. We also present a method for learning a transferable distance function for these patches. The transferable distance function learning extracts generic knowledge of patch weighting from previous training sets, and can be applied to videos of new actions without further learning. Our experimental results show that the transferable distance function learning not only improves the recognition accuracy of the single clip action recognition, but also significantly enhances the efficiency of the matching scheme.
asian conference on computer vision | 2009
Weilong Yang; Yang Wang; Greg Mori
In this paper, we address the problem of efficient human action detection with only one template. We choose the standard sliding-window approach to scan the template video against test videos, and the template video is represented by patch-based motion features. Using generic knowledge learnt from previous training sets, we weight the patches on the template video, by a transferable distance function. Based on the patch weighting, we propose a cascade structure which can efficiently scan the template video over test videos. Our method is evaluated on a human action dataset with cluttered background, and a ballet video with complex human actions. The experimental results show that our cascade structure not only achieves very reliable detection, but also can significantly improve the efficiency of patch-based human action detection, with an order of magnitude improvement in efficiency.
computer vision and pattern recognition | 2013
Guang-Tong Zhou; Tian Lan; Weilong Yang; Greg Mori
We conduct image classification by learning a class-to-image distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.
Archive | 2011
Weilong Yang; Yang Wang; Greg Mori
Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this chapter, we consider the problem of human action recognition from a single clip per action. Each clip contains at most 25 frames. Using a patch based motion descriptor and matching scheme, we can achieve promising results on three different action datasets with a single clip as the template. Our results are comparable to previously published results using much larger training sets. We also present a method for learning a transferable distance function for these patches. The transferable distance function learning extracts generic knowledge of patch weighting from previous training sets, and can be applied to videos of new actions without further learning. Our experimental results show that the transferable distance function learning not only improves the recognition accuracy of the single clip action recognition, but also significantly enhances the efficiency of the matching scheme.
british machine vision conference | 2011
Zhi Feng Huang; Weilong Yang; Yang Wang; Greg Mori
In this paper we present LatentBoost, a novel learning algorithm for training models with latent variables in a boosting framework. This algorithm allows for training of structured latent variable models with boosting. The popular latent SVM framework allows for training of models with structured latent variables in a max-margin framework. LatentBoost provides an analogous capability for boosting algorithms. The effectiveness of this framework is highlighted by an application to human action recognition. We show that LatentBoost can be used to train an action recognition model in which the trajectory of a person is a latent variable. This model outperforms baselines on a variety of datasets.
neural information processing systems | 2010
Tian Lan; Yang Wang; Weilong Yang; Greg Mori
european conference on computer vision | 2012
Tian Lan; Weilong Yang; Yang Wang; Greg Mori