Fengjun Lv
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fengjun Lv.
computer vision and pattern recognition | 2010
Jinjun Wang; Jianchao Yang; Kai Yu; Fengjun Lv; Thomas S. Huang; Yihong Gong
The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest-neighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.
computer vision and pattern recognition | 2007
Fengjun Lv; Ramakant Nevatia
3D human pose recovery is considered as a fundamental step in view-invariant human action recognition. However, inferring 3D poses from a single view usually is slow due to the large number of parameters that need to be estimated and recovered poses are often ambiguous due to the perspective projection. We present an approach that does not explicitly infer 3D pose at each frame. Instead, from existing action models we search for a series of actions that best match the input sequence. In our approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. Given the input, silhouette matching between the input frames and the key poses is performed first using an enhanced Pyramid Match Kernel algorithm. The best matched sequence of actions is then tracked using the Viterbi algorithm. We demonstrate this approach on a challenging video sets consisting of 15 complex action classes.
european conference on computer vision | 2006
Fengjun Lv; Ramakant Nevatia
Our goal is to automatically segment and recognize basic human actions, such as stand, walk and wave hands, from a sequence of joint positions or pose angles. Such recognition is difficult due to high dimensionality of the data and large spatial and temporal variations in the same action. We decompose the high dimensional 3-D joint space into a set of feature spaces where each feature corresponds to the motion of a single joint or combination of related multiple joints. For each feature, the dynamics of each action class is learned with one HMM. Given a sequence, the observation probability is computed in each HMM and a weak classifier for that feature is formed based on those probabilities. The weak classifiers with strong discriminative power are then combined by the Multi-Class AdaBoost (AdaBoost.M2) algorithm. A dynamic programming algorithm is applied to segment and recognize actions simultaneously. Results of recognizing 22 actions on a large number of motion capture sequences as well as several annotated and automatically tracked sequences show the effectiveness of the proposed algorithms.
computer vision and pattern recognition | 2011
Yuanqing Lin; Fengjun Lv; Shenghuo Zhu; Ming Yang; Timothee Cour; Kai Yu; Liangliang Cao; Thomas S. Huang
Most research efforts on image classification so far have been focused on medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G∼48G). There are two main reasons for the limited effort on large-scale image classification. First, until the emergence of ImageNet dataset, there was almost no publicly available large-scale benchmark data for image classification. This is mostly because class labels are expensive to obtain. Second, large-scale classification is hard because it poses more challenges than its medium-scale counterparts. A key challenge is how to achieve efficiency in both feature extraction and classifier training without compromising performance. This paper is to show how we address this challenge using ImageNet dataset as an example. For feature extraction, we develop a Hadoop scheme that performs feature extraction in parallel using hundreds of mappers. This allows us to extract fairly sophisticated features (with dimensions being hundreds of thousands) on 1.2 million images within one day. For SVM training, we develop a parallel averaging stochastic gradient descent (ASGD) algorithm for training one-against-all 1000-class SVM classifiers. The ASGD algorithm is capable of dealing with terabytes of training data and converges very fast–typically 5 epochs are sufficient. As a result, we achieve state-of-the-art performance on the ImageNet 1000-class classification, i.e., 52.9% in classification accuracy and 71.8% in top 5 hit rate.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006
Fengjun Lv; Tao Zhao; Ramakant Nevatia
A self-calibration method to estimate a cameras intrinsic and extrinsic parameters from vertical line segments of the same height is presented. An algorithm to obtain the needed line segments by detecting the head and feet positions of a walking human in his leg-crossing phases is described. Experimental results show that the method is accurate and robust with respect to various viewing angles and subjects
international conference on pattern recognition | 2002
Fengjun Lv; Tao Zhao; Ramakant Nevatia
Analysis of human activity from a video camera is simplified by the knowledge of the cameras intrinsic and extrinsic parameters. We describe a technique to estimate such parameters from image observations without requiring measurements of scene objects. We first develop a general technique for calibration using vanishing points and vanishing line. We then describe a method for estimating the needed points and line by observing the motion of a human in the scene. Experimental results, including error estimates, are presented.
computer vision and pattern recognition | 2001
Tao Zhao; Ramakant Nevatia; Fengjun Lv
Segmenting and tracking multiple humans is a challenging problem in complex situations in which extended occlusion, shadow and/or reflection exists. We tackle this problem with a 3D model-based approach. Our method includes two stages, segmentation (detection) and tracking. Human hypotheses are generated by shape analysis of the foreground blobs using a human shape model. The segmented human hypotheses are tracked with a Kalman filter with explicit handling of occlusion. Hypotheses are verified while being tracked for the first second or so. The verification is done by walking recognition using an articulated human walking model. We propose a new method to recognize walking using a motion template and temporal integration. Experiments show that our approach works robustly in very challenging sequences.
international conference on computer vision | 2009
Ming Yang; Fengjun Lv; Wei Xu; Yihong Gong
In video surveillance scenarios, appearances of both human and their nearby scenes may experience large variations due to scale and view angle changes, partial occlusions, or interactions of a crowd. These challenges may weaken the effectiveness of a dedicated target observation model even based on multiple cues, which demands for an agile framework to adjust target observation models dynamically to maintain their discriminative power. Towards this end, we propose a new adaptive way to integrate multi-cue in tracking multiple human driven by human detections. Given a human detection can be reliably associated with an existing trajectory, we adapt the way how to combine specifically devised models based on different cues in this tracker so as to enhance the discriminative power of the integrated observation model in its local neighborhood. This is achieved by solving a regression problem efficiently. Specifically, we employ 3 observation models for a single person tracker based on color models of part of torso regions, an elliptical head model, and bags of local features, respectively. Extensive experiments on 3 challenging surveillance datasets demonstrate long-term reliable tracking performance of this method.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Yuchi Huang; Qingshan Liu; Fengjun Lv; Yihong Gong; Dimitris N. Metaxas
We present a framework for unsupervised image categorization in which images containing specific objects are taken as vertices in a hypergraph and the task of image clustering is formulated as the problem of hypergraph partition. First, a novel method is proposed to select the region of interest (ROI) of each image, and then hyperedges are constructed based on shape and appearance features extracted from the ROIs. Each vertex (image) and its k-nearest neighbors (based on shape or appearance descriptors) form two kinds of hyperedges. The weight of a hyperedge is computed as the sum of the pairwise affinities within the hyperedge. Through all of the hyperedges, not only the local grouping relationships among the images are described, but also the merits of the shape and appearance characteristics are integrated together to enhance the clustering performance. Finally, a generalized spectral clustering technique is used to solve the hypergraph partition problem. We compare the proposed method to several methods and its effectiveness is demonstrated by extensive experiments on three image databases.
computer vision and pattern recognition | 2011
Ming Yang; Shenghuo Zhu; Fengjun Lv; Kai Yu
Visual recognition systems for videos using statistical learning models often show degraded performance when being deployed to a real-world environment, primarily due to the fact that training data can hardly cover sufficient variations in reality. To alleviate this issue, we propose to utilize the object correspondences in successive frames as weak supervision to adapt visual recognition models, which is particularly suitable for human profile recognition. Specifically, we substantialize this new strategy on an advanced convolutional neural network (CNN) based system to estimate human gender, age, and race. We enforce the system to output consistent and stable results on face images from the same trajectories in videos by using incremental stochastic training. Our baseline system already achieves competitive performance on gender and age estimation as compared to the state-of-the-art algorithms on the FG-NET database. Further, on two new video datasets containing about 900 persons, the proposed supervision of correspondences improves the estimation accuracy by a large margin over the baseline.