Menglong Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Menglong Zhu is active.

Explore More

Publication

Featured researches published by Menglong Zhu.

computer vision and pattern recognition | 2016

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Xiaowei Zhou; Menglong Zhu; Spyridon Leonardos; Konstantinos G. Derpanis; Kostas Daniilidis

This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence. Here, two cases are considered: (i) the image locations of the human joints are provided and (ii) the image locations of joints are unknown. In the former case, a novel approach is introduced that integrates a sparsity-driven 3D geometric prior and temporal smoothness. In the latter case, the former case is extended by treating the image locations of the joints as latent variables to take into account considerable uncertainties in 2D joint locations. A deep fully convolutional network is trained to predict the uncertainty maps of the 2D joint locations. The 3D pose estimates are realized via an Expectation-Maximization algorithm over the entire sequence, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Empirical evaluation on the Human3.6M dataset shows that the proposed approaches achieve greater 3D pose estimation accuracy over state-of-the-art baselines. Further, the proposed approach outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.

international conference on computer vision | 2013

From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding

Weiyu Zhang; Menglong Zhu; Konstantinos G. Derpanis

This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, key point labels (e.g., position) across space time are used in a data-driven training process to discover patches that are highly clustered in the space time key point configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of key points and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output, which sheds further light into detailed action understanding.

international conference on computer vision | 2015

Multi-image Matching via Fast Alternating Minimization

Xiaowei Zhou; Menglong Zhu; Kostas Daniilidis

In this paper we propose a global optimization-based approach to jointly matching a set of images. The estimated correspondences simultaneously maximize pairwise feature affinities and cycle consistency across multiple images. Unlike previous convex methods relying on semidefinite programming, we formulate the problem as a low-rank matrix recovery problem and show that the desired semidefiniteness of a solution can be spontaneously fulfilled. The low-rank formulation enables us to derive a fast alternating minimization algorithm in order to handle practical problems with thousands of features. Both simulation and real experiments demonstrate that the proposed algorithm can achieve a competitive performance with an order of magnitude speedup compared to the state-of-the-art algorithm. In the end, we demonstrate the applicability of the proposed method to match the images of different object instances and as a result the potential to reconstruct category-specific object models from those images.

international conference on robotics and automation | 2014

Single image 3D object detection and pose estimation for grasping

Menglong Zhu; Konstantinos G. Derpanis; Yinfei Yang; Samarth Brahmbhatt; Mabel M. Zhang; Cody J. Phillips; Matthieu Lecce; Kostas Daniilidis

We present a novel approach for detecting objects and estimating their 3D pose in single images of cluttered scenes. Objects are given in terms of 3D models without accompanying texture cues. A deformable parts-based model is trained on clusters of silhouettes of similar poses and produces hypotheses about possible object locations at test time. Objects are simultaneously segmented and verified inside each hypothesis bounding region by selecting the set of superpixels whose collective shape matches the model silhouette. A final iteration on the 6-DOF object pose minimizes the distance between the selected image contours and the actual projection of the 3D model. We demonstrate successful grasps using our detection and pose estimate with a PR2 robot. Extensive evaluation with a novel ground truth dataset shows the considerable benefit of using shape-driven cues for detecting objects in heavily cluttered scenes.

robotics science and systems | 2014

Semantic Localization Via the Matrix Permanent

Nikolay Atanasov; Menglong Zhu; Kostas Daniilidis; George J. Pappas

Most approaches to robot localization rely on lowlevel geometric features such as points, lines, and planes. In this paper, we use object recognition to obtain semantic information from the robot’s sensors and consider the task of localizing the robot within a prior map of landmarks, which are annotated with semantic labels. As object recognition algorithms miss detections and produce false alarms, correct data association between the detections and the landmarks on the map is central to the semantic localization problem. Instead of the traditional vectorbased representations, we use random finite sets to represent the object detections. This allows us to explicitly incorporate missed detections, false alarms, and data association in the sensor model. Our second contribution is to reduce the problem of computing the likelihood of a set-valued observation to the problem of computing a matrix permanent. It is this crucial transformation that enables us to solve the semantic localization problem with a polynomial-time approximation to the set-based Bayes filter. The performance of our approach is demonstrated in simulation and in a real environment using a deformable-part-model-based object detector. Comparisons are made with the traditional lidarbased geometric Monte-Carlo localization.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017

Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach

Xiaowei Zhou; Menglong Zhu; Spyridon Leonardos; Kostas Daniilidis

We investigate the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image. A successful approach to alleviating the reconstruction ambiguity is the 3D deformable shape model and a sparse representation is often used to capture complex shape variability. But the model inference is still challenging due to the nonconvexity in the joint optimization of shape and viewpoint. In contrast to prior work that relies on an alternating scheme whose solution depends on initialization, we propose a convex approach to addressing this challenge and develop an efficient algorithm to solve the proposed convex program. We further propose a robust model to handle gross errors in the 2D correspondences. We demonstrate the exact recovery property of the proposed method, the advantage compared to several nonconvex baselines and the applicability to recover 3D human poses and car models from single images.

european conference on computer vision | 2014

Active Deformable Part Models Inference

Menglong Zhu; Nikolay Atanasov; George J. Pappas; Kostas Daniilidis

This paper presents an active approach for part-based object detection, which optimizes the order of part filter evaluations and the time at which to stop and make a prediction. Statistics, describing the part responses, are learned from training data and are used to formalize the part scheduling problem as an offline optimization. Dynamic programming is applied to obtain a policy, which balances the number of part evaluations with the classification accuracy. During inference, the policy is used as a look-up table to choose the part order and the stopping time based on the observed filter responses. The method is faster than cascade detection with deformable part models (which does not optimize the part order) with negligible loss in accuracy when evaluated on the PASCAL VOC 2007 and 2010 datasets.

The International Journal of Robotics Research | 2016

Localization from semantic observations via the matrix permanent

Nikolay Atanasov; Menglong Zhu; Kostas Daniilidis; George J. Pappas

Most approaches to robot localization rely on low-level geometric features such as points, lines, and planes. In this paper, we use object recognition to obtain semantic information from the robot’s sensors and consider the task of localizing the robot within a prior map of landmarks, which are annotated with semantic labels. As object recognition algorithms miss detections and produce false alarms, correct data association between the detections and the landmarks on the map is central to the semantic localization problem. Instead of the traditional vector-based representation, we propose a sensor model, which encodes the semantic observations via random finite sets and enables a unified treatment of missed detections, false alarms, and data association. Our second contribution is to reduce the problem of computing the likelihood of a set-valued observation to the problem of computing a matrix permanent. It is this crucial transformation that allows us to solve the semantic localization problem with a polynomial-time approximation to the set-based Bayes filter. Finally, we address the active semantic localization problem, in which the observer’s trajectory is planned in order to improve the accuracy and efficiency of the localization process. The performance of our approach is demonstrated in simulation and in real environments using deformable-part-model-based object detectors. Robust global localization from semantic observations is demonstrated for a mobile robot, for the Project Tango phone, and on the KITTI visual odometry dataset. Comparisons are made with the traditional lidar-based geometric Monte Carlo localization.

international conference on computer vision | 2015

Single Image Pop-Up from Discriminatively Learned Parts

Menglong Zhu; Xiaowei Zhou; Kostas Daniilidis

We introduce a new approach for estimating a fine grained 3D shape and continuous pose of an object from a single image. Given a training set of view exemplars, we learn and select appearance-based discriminative parts which are mapped onto the 3D model through a facility location optimization. The training set of 3D models is summarized into a set of basis shapes from which we can generalize by linear combination. Given a test image, we detect hypotheses for each part. The main challenge is to select from these hypotheses and compute the 3D pose and shape coefficients at the same time. To achieve this, we optimize a function that considers simultaneously the appearance matching of the parts as well as the geometric reprojection error. We apply the alternating direction method of multipliers (ADMM) to minimize the resulting convex function. Our main and novel contribution is the simultaneous solution for part localization and detailed 3D geometry estimation by maximizing both appearance and geometric compatibility with convex relaxation.

international conference on computer vision | 2012

Monocular visual odometry and dense 3d reconstruction for on-road vehicles

Menglong Zhu; Srikumar Ramalingam; Yuichi Taguchi; Tyler W. Garaas

More and more on-road vehicles are equipped with cameras each day. This paper presents a novel method for estimating the relative motion of a vehicle from a sequence of images obtained using a single vehicle-mounted camera. Recently, several researchers in robotics and computer vision have studied the performance of motion estimation algorithms under non-holonomic constraints and planarity. The successful algorithms typically use the smallest number of feature correspondences with respect to the motion model. It has been strongly established that such minimal algorithms are efficient and robust to outliers when used in a hypothesize-and-test framework such as random sample consensus (RANSAC). In this paper, we show that the planar 2-point motion estimation can be solved analytically using a single quadratic equation, without the need of iterative techniques such as Newton-Raphson method used in existing work. Non-iterative methods are more efficient and do not suffer from local minima problems. Although 2-point motion estimation generates visually accurate on-road vehicle trajectory, the motion is not precise enough to perform dense 3D reconstruction due to the non-planarity of roads. Thus we use a 2-point relative motion algorithm for the initial images followed by 3-point 2D-to-3D camera pose estimation for the subsequent images. Using this hybrid approach, we generate accurate motion estimates for a plane-sweeping algorithm that produces dense depth maps for obstacle detection applications.

Explore More