Is this you? Create Your Porfile

Yunde Jia

Beijing Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunde Jia is active.

Explore More

Publication

Featured researches published by Yunde Jia.

international conference on computer vision | 2013

Go-ICP: Solving 3D Registration Efficiently and Globally Optimally

Jiaolong Yang; Hongdong Li; Yunde Jia

Registration is a fundamental task in computer vision. The Iterative Closest Point (ICP) algorithm is one of the widely-used methods for solving the registration problem. Based on local iteration, ICP is however well-known to suffer from local minima. Its performance critically relies on the quality of initialization, and only local optimality is guaranteed. This paper provides the very first globally optimal solution to Euclidean registration of two 3D point sets or two 3D surfaces under the L2 error. Our method is built upon ICP, but combines it with a branch-and-bound (BnB) scheme which searches the 3D motion space SE(3) efficiently. By exploiting the special structure of the underlying geometry, we derive novel upper and lower bounds for the ICP error function. The integration of local ICP and global BnB enables the new method to run efficiently in practice, and its optimality is exactly guaranteed. We also discuss extensions, addressing the issue of outlier robustness.

international conference on computer vision | 2011

Parsing video events with goal inference and intent prediction

Mingtao Pei; Yunde Jia; Song-Chun Zhu

In this paper, we present an event parsing algorithm based on Stochastic Context Sensitive Grammar (SCSG) for understanding events, inferring the goal of agents, and predicting their plausible intended actions. The SCSG represents the hierarchical compositions of events and the temporal relations between the sub-events. The alphabets of the SCSG are atomic actions which are defined by the poses of agents and their interactions with objects in the scene. The temporal relations are used to distinguish events with similar structures, interpolate missing portions of events, and are learned from the training data. In comparison with existing methods, our paper makes the following contributions. i) We define atomic actions by a set of relations based on the fluents of agents and their interactions with objects in the scene. ii) Our algorithm handles events insertion and multi-agent events, keeps all possible interpretations of the video to preserve the ambiguities, and achieves the globally optimal parsing solution in a Bayesian framework; iii) The algorithm infers the goal of the agents and predicts their intents by a top-down process; iv) The algorithm improves the detection of atomic actions by event contexts. We show satisfactory results of event recognition and atomic action detection on the data set we captured which contains 12 event categories in both indoor and outdoor videos.

computer vision and pattern recognition | 2011

Intrinsic images using optimization

Jianbing Shen; Xiaoshan Yang; Yunde Jia; Xuelong Li

In this paper, we present a novel intrinsic image recovery approach using optimization. Our approach is based on the assumption of color characteristics in a local window in natural images. Our method adopts a premise that neighboring pixels in a local window of a single image having similar intensity values should have similar reflectance values. Thus the intrinsic image decomposition is formulated by optimizing an energy function with adding a weighting constraint to the local image properties. In order to improve the intrinsic image extraction results, we specify local constrain cues by integrating the user strokes in our energy formulation, including constant-reflectance, constant-illumination and fixed-illumination brushes. Our experimental results demonstrate that our approach achieves a better recovery of intrinsic reflectance and illumination components than by previous approaches.

european conference on computer vision | 2012

View-Invariant action recognition using latent kernelized structural SVM

Xinxiao Wu; Yunde Jia

This paper goes beyond recognizing human actions from a fixed view and focuses on action recognition from an arbitrary view. A novel learning algorithm, called latent kernelized structural SVM, is proposed for the view-invariant action recognition, which extends the kernelized structural SVM framework to include latent variables. Due to the changing and frequently unknown positions of the camera, we regard the view label of action as a latent variable and implicitly infer it during both learning and inference. Motivated by the geometric correlation between different views and semantic correlation between different action classes, we additionally propose a mid-level correlation feature which describes an action video by a set of decision values from the pre-learned classifiers of all the action classes from all the views. Each decision value captures both geometric and semantic correlations between the action video and the corresponding action class from the corresponding view. After that, we combine the low-level visual cue, mid-level correlation description, and high-level label information into a novel nonlinear kernel under the latent kernelized structural SVM framework. Extensive experiments on multi-view IXMAS and MuHAVi action datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

computer vision and pattern recognition | 2013

Discriminatively Trained And-Or Tree Models for Object Detection

Xi Song; Tianfu Wu; Yunde Jia; Song-Chun Zhu

This paper presents a method of learning reconfigurable And-Or Tree (AOT) models discriminatively from weakly annotated data for object detection. To explore the appearance and geometry space of latent structures effectively, we first quantize the image lattice using an over complete set of shape primitives, and then organize them into a directed a cyclic And-Or Graph (AOG) by exploiting their compositional relations. We allow overlaps between child nodes when combining them into a parent node, which is equivalent to introducing an appearance Or-node implicitly for the overlapped portion. The learning of an AOT model consists of three components: (i) Unsupervised sub-category learning (i.e., branches of an object Or-node) with the latent structures in AOG being integrated out. (ii) Weakly supervised part configuration learning (i.e., seeking the globally optimal parse trees in AOG for each sub-category). To search the globally optimal parse tree in AOG efficiently, we propose a dynamic programming (DP) algorithm. (iii) Joint appearance and structural parameters training under latent structural SVM framework. In experiments, our method is tested on PASCAL VOC 2007 and 2010 detection benchmarks of 20 object classes and outperforms comparable state-of-the-art methods.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

Action Recognition Using Multilevel Features and Latent Structural SVM

Xinxiao Wu; Dong Xu; Lixin Duan; Jiebo Luo; Yunde Jia

We first propose a new low-level visual feature, called spatio-temporal context distribution feature of interest points, to describe human actions. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global Gaussian mixture model (GMM) (referred to as a universal background model) using the relative coordinate features from all the training videos, and then we represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multiscale local regions. Motivated by the observation that some actions share similar motion patterns, we additionally propose a novel mid-level class correlation feature to capture the semantic correlations between different action classes. Each input action video is represented by a set of decision values obtained from the pre-learned classifiers of all the action classes, with each decision value measuring the likelihood that the input video belongs to the corresponding action class. Moreover, human actions are often associated with some specific natural environments and also exhibit high correlation with particular scene classes. It is therefore beneficial to utilize the contextual scene information for action recognition. In this paper, we build the high-level co-occurrence relationship between action classes and scene classes to discover the mutual contextual constraints between action and scene. By treating the scene class label as a latent variable, we propose to use the latent structural SVM (LSSVM) model to jointly capture the compatibility between multilevel action features (e.g., low-level visual context distribution feature and the corresponding mid-level class correlation feature) and action classes, the compatibility between multilevel scene features (i.e., SIFT feature and the corresponding class correlation feature) and scene classes, and the contextual relationship between action classes and scene classes. Extensive experiments on UCF Sports, YouTube and UCF50 datasets demonstrate the effectiveness of the proposed multilevel features and action-scene interaction based LSSVM model for human action recognition. Moreover, our method generally achieves higher recognition accuracy than other state-of-the-art methods on these datasets.

international conference on computer vision | 2013

Cross-View Action Recognition over Heterogeneous Feature Spaces

Xinxiao Wu; Han Wang; Cuiwei Liu; Yunde Jia

In cross-view action recognition, what you saw in one view is different from what you recognize in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the common feature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

international conference on pattern recognition | 2014

Vehicle Type Classification Using Unsupervised Convolutional Neural Network

Zhen Dong; Mingtao Pei; Yang He; Ting Liu; Yanmei Dong; Yunde Jia

In this paper, we propose an appearance-based vehicle type classification method from vehicle frontal view images. Unlike other methods using hand-crafted visual features, our method is able to automatically learn good features for vehicle type classification by using a convolutional neural network. In order to capture rich and discriminative information of vehicles, the network is pre-trained by the sparse filtering which is an unsupervised learning method. Besides, the network is with layer-skipping to ensure that final features contain both high-level global and low-level local features. After the final features are obtained, the soft max regression is used to classify vehicle types. We build a challenging vehicle dataset called BIT-Vehicle dataset to evaluate the performance of our method. Experimental results on a public dataset and our own dataset demonstrate that our method is quite effective in classifying vehicle types.

Pattern Recognition Letters | 2011

Adaptive learning codebook for action recognition

Yu Kong; Xiaoqin Zhang; Weiming Hu; Yunde Jia

Learning a compact and yet discriminative codebook is an important procedure for local feature-based action recognition. A common procedure involves two independent phases: reducing the dimensionality of local features and then performing clustering. Since the two phases are disconnected, dimensionality reduction does not necessarily capture the dimensions that are greatly helpful for codebook creation. Whats more, some dimensionality reduction techniques such as the principal component analysis do not take class separability into account and thus may not help build an effective codebook. In this paper, we propose the weighted adaptive metric learning (WAML) which integrates the two independent phases into a unified optimization framework. This framework enables to select indispensable and crucial dimensions for building a discriminative codebook. The dimensionality reduction phase in the WAML is optimized for class separability and adaptively adjusts the distance metric to improve the separability of data. In addition, the video word weighting is smoothly incorporated into the WAML to accurately generate video words. Experimental results demonstrate that our approach builds a highly discriminative codebook and achieves comparable results to other state-of-the-art approaches.

Pattern Recognition Letters | 2006

Face recognition with local steerable phase feature

Xiaoxun Zhang; Yunde Jia

In this paper, we propose a novel local steerable phase (LSP) feature extracted from the face image using steerable filters for face recognition. The new type of local feature is semi-invariant under common image deformations and distinctive enough to provide useful identity information. Phase information provided by steerable filters is locally stable with respect to scale changes, noise and brightness changes. Phase features from multiple scales and orientations are concatenated to an augmented feature vector which is used to evaluate similarity between face images. We use a nearest-neighbor classifier based on the local weighted phase-correlation for final classification. The experimental results on FERET dataset show an encouraging recognition performance.

Explore More