Daniel Munoz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Munoz is active.

Explore More

Publication

Featured researches published by Daniel Munoz.

european conference on computer vision | 2010

Stacked hierarchical labeling

Daniel Munoz; J. Andrew Bagnell; Martial Hebert

In this work we propose a hierarchical approach for labeling semantic objects and regions in scenes. Our approach is reminiscent of early vision literature in that we use a decomposition of the image in order to encode relational and spatial information. In contrast to much existing work on structured prediction for scene understanding, we bypass a global probabilistic model and instead directly train a hierarchical inference procedure inspired by the message passing mechanics of some approximate inference procedures in graphical models. This approach mitigates both the theoretical and empirical difficulties of learning probabilistic models when exact inference is intractable. In particular, we draw from recent work in machine learning and break the complex inference process into a hierarchical series of simple machine learning subproblems. Each subproblem in the hierarchy is designed to capture the image and contextual statistics in the scene. This hierarchy spans coarse-to-fine regions and explicitly models the mixtures of semantic labels that may be present due to imperfect segmentation. To avoid cascading of errors and overfitting, we train the learning problems in sequence to ensure robustness to likely errors earlier in the inference sequence and leverage the stacking approach developed by Cohen et al.

computer vision and pattern recognition | 2009

Contextual classification with functional Max-Margin Markov Networks

Daniel Munoz; J. Andrew Bagnell; Nicolas Vandapel; Martial Hebert

We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification. With this approach we can learn robust models involving high-order interactions better than the previously used learning method. We validate the approach in the context of point cloud classification and improve the state of the art. In addition, we successfully demonstrate the generality of the approach on the challenging vision problem of recovering 3-D geometric surfaces from images.

international conference on robotics and automation | 2011

3-D scene analysis via sequenced predictions over points and regions

Xuehan Xiong; Daniel Munoz; J. Andrew Bagnell; Martial Hebert

We address the problem of understanding scenes from 3-D laser scans via per-point assignment of semantic labels. In order to mitigate the difficulties of using a graphical model for modeling the contextual relationships among the 3-D points, we instead propose a multi-stage inference procedure to capture these relationships. More specifically, we train this procedure to use point cloud statistics and learn relational information (e.g., tree-trunks are below vegetation) over fine (point-wise) and coarse (region-wise) scales. We evaluate our approach on three different datasets, that were obtained from different sensors, and demonstrate improved performance.

european conference on computer vision | 2014

Pose Machines: Articulated Pose Estimation via Inference Machines

Varun Ramakrishna; Daniel Munoz; Martial Hebert; James Andrew Bagnell; Yaser Sheikh

State-of-the-art approaches for articulated human pose estimation are rooted in parts-based graphical models. These models are often restricted to tree-structured representations and simple parametric potentials in order to enable tractable inference. However, these simple dependencies fail to capture all the interactions between body parts. While models with more complex interactions can be defined, learning the parameters of these models remains challenging with intractable or approximate inference. In this paper, instead of performing inference on a learned graphical model, we build upon the inference machine framework and present a method for articulated human pose estimation. Our approach incorporates rich spatial interactions among multiple parts and information across parts of different scales. Additionally, the modular framework of our approach enables both ease of implementation without specialized optimization solvers, and efficient inference. We analyze our approach on two challenging datasets with large pose variation and outperform the state-of-the-art on these benchmarks.

international conference on robotics and automation | 2009

Onboard contextual classification of 3-D point clouds with learned high-order Markov Random Fields

Daniel Munoz; Nicolas Vandapel; Martial Hebert

Contextual reasoning through graphical models such as Markov Random Fields often show superior performance against local classifiers in many domains. Unfortunately, this performance increase is often at the cost of time consuming, memory intensive learning and slow inference at testing time. Structured prediction for 3-D point cloud classification is one example of such an application. In this paper we present two contributions. First we show how efficient learning of a random field with higher-order cliques can be achieved using subgradient optimization. Second, we present a context approximation using random fields with high-order cliques designed to make this model usable online, onboard a mobile vehicle for environment modeling. We obtained results with the mobile vehicle on a variety of terrains, at 1/3 Hz for a map 25 × 50 meters and a vehicle speed of 1–2 m/s.

computer vision and pattern recognition | 2011

Learning message-passing inference machines for structured prediction

Stéphane Ross; Daniel Munoz; Martial Hebert; J. Andrew Bagnell

Nearly every structured prediction problem in computer vision requires approximate inference due to large and complex dependencies among output labels. While graphical models provide a clean separation between modeling and inference, learning these models with approximate inference is not well understood. Furthermore, even if a good model is learned, predictions are often inaccurate due to approximations. In this work, instead of performing inference over a graphical model, we instead consider the inference procedure as a composition of predictors. Specifically, we focus on message-passing algorithms, such as Belief Propagation, and show how they can be viewed as procedures that sequentially predict label distributions at each node over a graph. Given labeled graphs, we can then train the sequence of predictors to output the correct labeling s. The result no longer corresponds to a graphical model but simply defines an inference procedure, with strong theoretical properties, that can be used to classify new graphs. We demonstrate the scalability and efficacy of our approach on 3D point cloud classification and 3D surface estimation from single images.

european conference on computer vision | 2012

Co-inference for multi-modal scene analysis

Daniel Munoz; James Andrew Bagnell; Martial Hebert

We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach.

international conference on robotics and automation | 2013

Efficient 3-D scene analysis from streaming data

Hanzhang Hu; Daniel Munoz; J. Andrew Bagnell; Martial Hebert

Rich scene understanding from 3-D point clouds is a challenging task that requires contextual reasoning, which is typically computationally expensive. The task is further complicated when we expect the scene analysis algorithm to also efficiently handle data that is continuously streamed from a sensor on a mobile robot. Hence, we are typically forced to make a choice between 1) using a precise representation of the scene at the cost of speed, or 2) making fast, though inaccurate, approximations at the cost of increased misclassifications. In this work, we demonstrate that we can achieve the best of both worlds by using an efficient and simple representation of the scene in conjunction with recent developments in structured prediction in order to obtain both efficient and state-of-the-art classifications. Furthermore, this efficient scene representation naturally handles streaming data and provides a 300% to 500% speedup over more precise representations.

international conference on robotics and automation | 2013

Efficient temporal consistency for streaming video scene analysis

Ondrej Miksik; Daniel Munoz; J. Andrew Bagnell; Martial Hebert

We address the problem of image-based scene analysis from streaming video, as would be seen from a moving platform, in order to efficiently generate spatially and temporally consistent predictions of semantic categories over time. In contrast to previous techniques which typically address this problem in batch and/or through graphical models, we demonstrate that by learning visual similarities between pixels across frames, a simple filtering algorithfiltering algorithmm is able to achieve high performance predictions in an efficient and online/causal manner. Our technique is a meta-algorithm that can be efficiently wrapped around any scene analysis technique that produces a per-pixel semantic category distribution. We validate our approach over three different scene analysis techniques on three different datasets that contain different semantic object categories. Our experiments demonstrate that our approach is very efficient in practice and substantially improves the consistency of the predictions over time.

Proceedings of SPIE | 2013

An architecture for online semantic labeling on UGVs

Arne Suppé; Luis E. Navarro-Serment; Daniel Munoz; Drew Bagnell; Martial Hebert

We describe an architecture to provide online semantic labeling capabilities to field robots operating in urban environments. At the core of our system is the stacked hierarchical classifier developed by Munoz et al., which classifies regions in monocular color images using models derived from hand labeled training data. The classifier is trained to identify buildings, several kinds of hard surfaces, grass, trees, and sky. When taking this algorithm into the real world, practical concerns with difficult and varying lighting conditions require careful control of the imaging process. First, camera exposure is controlled by software, examining all of the images pixels, to compensate for the poorly performing, simplistic algorithm used on the camera. Second, by merging multiple images taken with different exposure times, we are able to synthesize images with higher dynamic range than the ones produced by the sensor itself. The sensor s limited dynamic range makes it difficult to, at the same time, properly expose areas in shadow along with high albedo surfaces that are directly illuminated by the sun. Texture is a key feature used by the classifier, and under /over exposed regions lacking texture are a leading cause of misclassifications. The results of the classifier are shared with higher-lev elements operating in the UGV in order to perform tasks such as building identification from a distance and finding traversable surfaces.

Explore More