Is this you? Create Your Porfile

Jingjing Zheng

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jingjing Zheng is active.

Explore More

Publication

Featured researches published by Jingjing Zheng.

british machine vision conference | 2012

Cross-View Action Recognition via a Transferable Dictionary Pair

Jingjing Zheng; Zhuolin Jiang; P. Jonathon Phillips; Rama Chellappa

Discriminative appearance features are effective for recognizing actions in a fixed view, but generalize poorly to changes in viewpoint. We present a method for viewinvariant action recognition based on sparse representations using a transferable dictionary pair. A transferable dictionary pair consists of two dictionaries that correspond to the source and target views respectively. The two dictionaries are learned simultaneously from pairs of videos taken at different views and aim to encourage each video in the pair to have the same sparse representation. Thus, the transferable dictionary pair links features between the two views that are useful for action recognition. Both unsupervised and supervised algorithms are presented for learning transferable dictionary pairs. Using the sparse representation as features, a classifier built in the source view can be directly transferred to the target view. We extend our approach to transferring an action model learned from multiple source views to one target view. We demonstrate the effectiveness of our approach on the multi-view IXMAS data set. Our results compare favorably to the the state of the art.

international conference on computer vision | 2013

Learning View-Invariant Sparse Representations for Cross-View Action Recognition

Jingjing Zheng; Zhuolin Jiang

We present an approach to jointly learn a set of view-specific dictionaries and a common dictionary for cross-view action recognition. The set of view-specific dictionaries is learned for specific views while the common dictionary is shared across different views. Our approach represents videos in each view using both the corresponding view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. In this way, we can align view-specific features in the sparse feature spaces spanned by the view-specific dictionary set and transfer the view-shared features in the sparse feature space spanned by the common dictionary. Meanwhile, the incoherence between the common dictionary and the view-specific dictionary set enables us to exploit the discrimination information encoded in view-specific features and view-shared features separately. In addition, the learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labels exist in the target view. Extensive experiments using the multi-view IXMAS dataset demonstrate that our approach outperforms many recent approaches for cross-view action recognition.

IEEE Transactions on Image Processing | 2016

Cross-View Action Recognition via Transferable Dictionary Learning

Jingjing Zheng; Zhuolin Jiang; Rama Chellappa

Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from different views of the same action to have similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. Extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.

british machine vision conference | 2015

Bridging the Domain Shift by Domain Adaptive Dictionary Learning.

Hongyu Xu; Jingjing Zheng; Rama Chellappa

Domain adaptation (DA) tackles the problem where data from the training set (source domain) and test set (target domain) have different underlying distributions. For instance, training and testing images may be acquired under different environments, viewpoints and illumination conditions. In this paper, we focus on the more challenging unsupervised DA problem where the samples in the target domain are unlabeled. It is noticed that dictionary learning has gained a lot of popularity due to the fact that images of interest could be reconstructed sparsely in an appropriately learned dictionary [1]. Specifically, we propose a novel domain-adaptive dictionary learning approach to generate a set of intermediate domains which bridge the gap between source and target domains. Our approach defines two types of dictionaries: a common dictionary and a domain-specific dictionary. The overall learning process illustrated in Figure 1 consists of three steps: (1) At the beginning, we first learn the common dictionary DC, domainspecific dictionaries D0 and Dt for source and target domains. (2) At the k-th step, we enforce the recovered feature representations of target data in all available domains to have the same sparse codes, while adapting the most recently obtained dictionary Dk to better represent the target domain. Then we multiply dictionaries in the k-th domain with the corresponding sparse codes to recover feature representations of target data Xk t in this domain. (3) We update Dk to find the next domain-specific dictionary Dk+1 by further minimizing the reconstruction error in representing the target data. Then we alternate between the steps of sparse coding and dictionary updating until the stopping criteria is satisfied. Notations: Let X s ∈ Rd×Ns , X t ∈ Rd×Nt be the feature representations of source and target data respectively, where d is the feature dimension, Ns and Nt are the number of samples in the two domains. The feature representations of recovered source and target data in the k-th intermediate domain are denoted as Xs ∈ Rd×Ns and Xk t ∈ Rd×Nt respectively. The common dictionary is denoted as DC, whereas source-specific and targetspecific dictionaries are denoted as D0, Dt respectively. Similarly, we use Dk,k = 1...N to denote the domain-specific dictionary for the k-th domain, where N is the number of intermediate domains. We set all the dictionaries to be of the same size ∈Rd×n. At the beginning, we learn the common dictionary DC by minimizing the reconstruction error of both source and target data as follows:

workshop on applications of computer vision | 2016

Learning a structured dictionary for video-based face recognition

Hongyu Xu; Jingjing Zheng; Azadeh Alavi; Rama Chellappa

In this paper, we propose a structured dictionary learning framework for video-based face recognition. We discover the invariant structural information from different videos of each subject. Specifically, we employ dictionary learning and low-rank approximation to preserve the invariant structure of face images in videos. The learned dictionary is both discriminative and reconstructive. Thus, we not only minimize the reconstruction error of all the face images but also encourage a sub-dictionary to represent the corresponding subject from different videos. Moreover, by introducing the low-rank approximation, the proposed method is able to discover invariant structured information from different videos of the same subject. To this end, an efficient alternating algorithm is employed to learn our structured dictionary. Extensive experiments on three video-based face recognition databases show that our approach outperforms several state-of-the-art methods.

computer vision and pattern recognition | 2013

Tag Taxonomy Aware Dictionary Learning for Region Tagging

Jingjing Zheng; Zhuolin Jiang

Tags of image regions are often arranged in a hierarchical taxonomy based on their semantic meanings. In this paper, using the given tag taxonomy, we propose to jointly learn multi-layer hierarchical dictionaries and corresponding linear classifiers for region tagging. Specifically, we generate a node-specific dictionary for each tag node in the taxonomy, and then concatenate the node-specific dictionaries from each level to construct a level-specific dictionary. The hierarchical semantic structure among tags is preserved in the relationship among node-dictionaries. Simultaneously, the sparse codes obtained using the level-specific dictionaries are summed up as the final feature representation to design a linear classifier. Our approach not only makes use of sparse codes obtained from higher levels to help learn the classifiers for lower levels, but also encourages the tag nodes from lower levels that have the same parent tag node to implicitly share sparse codes obtained from higher levels. Experimental results using three benchmark datasets show that the proposed approach yields the best performance over recently proposed methods.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Submodular Attribute Selection for Visual Recognition

Jingjing Zheng; Zhuolin Jiang; Rama Chellappa

In real-world visual recognition problems, low-level features cannot adequately characterize the semantic content in images, or the spatio-temporal structure in videos. In this work, we encode objects or actions based on attributes that describe them as high-level concepts. We consider two types of attributes. One type of attributes is generated by humans, while the second type is data-driven attributes extracted from data using dictionary learning methods. Attribute-based representation may exhibit variations due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and its solution is guaranteed to be at least (1-1/e)-approximation to the optimum. Experimental results on four public datasets demonstrate that the proposed attribute-based representation significantly boosts the performance of visual recognition and outperforms most recently proposed recognition approaches.

international conference on pattern recognition | 2016

Template regularized sparse coding for face verification

Hongyu Xu; Jingjing Zheng; Azadeh Alavi; Rama Chellappa

In this paper, we propose a novel regularized sparse coding approach for template-based unconstrained face verification. Unlike traditional verification tasks, which require the evaluation on image-to-image or video-to-video pairs, template-based face verification/recognition methods can exploit training and/or gallery data containing a mixture of both images or videos from the person of interest. The proposed regularized sparse coding approach addresses the adaptation to training and gallery data using three steps. First, we construct a reference dictionary, which represents the training set. Then we learn the discriminative sparse codes of the templates for verification through the proposed template regularized sparse coding approach. Finally, we measure the similarity between templates. An efficient algorithm is employed to learn the template regularized sparse codes. Extensive experiments on the template-based verification benchmark dataset show that the proposed approach outperforms several state-of-the-art methods.

international conference on pattern recognition | 2012