Is this you? Create Your Porfile

Michal Havlena

Czech Technical University in Prague

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michal Havlena is active.

Explore More

Publication

Featured researches published by Michal Havlena.

european conference on computer vision | 2010

Efficient structure from motion by graph optimization

Michal Havlena; Akihiko Torii; Tomáš Pajdla

We present an efficient structure from motion algorithm that can deal with large image collections in a fraction of time and effort of previous approaches while providing comparable quality of the scene and camera reconstruction. First, we employ fast image indexing using large image vocabularies to measure visual overlap of images without running actual image matching. Then, we select a small subset from the set of input images by computing its approximate minimal connected dominating set by a fast polynomial algorithm. Finally, we use task prioritization to avoid spending too much time in a few difficult matching problems instead of exploring other easier options. Thus we avoid wasting time on image pairs with low chance of success and avoid matching of highly redundant images of landmarks. We present results for several challenging sets of thousands of perspective as well as omnidirectional images.

computer vision and pattern recognition | 2009

Randomized structure from motion based on atomic 3D models from camera triplets

Michal Havlena; Akihiko Torii; Jan Knopp; Tomáš Pajdla

This paper presents a new efficient technique for large-scale structure from motion from unordered data sets. We avoid costly computation of all pairwise matches and geometries by sampling pairs of images using the pairwise similarity scores based on the detected occurrences of visual words leading to a significant speedup. Furthermore, atomic 3D models reconstructed from camera triplets are used as the seeds which form the final large-scale 3D model when merged together. Using three views instead of two allows us to reveal most of the outliers of pairwise geometries at an early stage of the process hindering them from derogating the quality of the resulting 3D structure at later stages. The accuracy of the proposed technique is shown on a set of 64 images where the result of the exhaustive technique is known. Scalability is demonstrated on a landmark reconstruction from hundreds of images.

international conference on computer vision | 2009

From Google Street View to 3D city models

Akihiko Torii; Michal Havlena; Tomáš Pajdla

We present a structure-from-motion (SfM) pipeline for visual 3D modeling of a large city area using 360° field of view Google Street View images. The core of the pipeline combines the state of the art techniques such as SURF feature detection, tentative matching by an approximate nearest neighbour search, relative camera motion estimation by solving 5-pt minimal camera pose problem, and sparse bundle adjustment. The robust and stable camera poses estimated by PROSAC with soft voting and by scale selection using a visual cone test bring high quality initial structure for bundle adjustment. Furthermore, searching for trajectory loops based on co-occurring visual words and closing them by adding new constraints for the bundle adjustment enforce the global consistency of camera poses and 3D structure in the sequence. We present a large-scale reconstruction computed from 4,799 images of the Google Street View Pittsburgh Research Data Set.

computer vision and pattern recognition | 2014

Predicting Matchability

Wilfried Hartmann; Michal Havlena; Konrad Schindler

The initial steps of many computer vision algorithms are interest point extraction and matching. In larger image sets the pairwise matching of interest point descriptors between images is an important bottleneck. For each descriptor in one image the (approximate) nearest neighbor in the other one has to be found and checked against the second-nearest neighbor to ensure the correspondence is unambiguous. Here, we asked the question how to best decimate the list of interest points without losing matches, i.e. we aim to speed up matching by filtering out, in advance, those points which would not survive the matching stage. It turns out that the best filtering criterion is not the response of the interest point detector, which in fact is not surprising: the goal of detection are repeatable and well-localized points, whereas the objective of the selection are points whose descriptors can be matched successfully. We show that one can in fact learn to predict which descriptors are matchable, and thus reduce the number of interest points significantly without losing too many matches. We show that this strategy, as simple as it is, greatly improves the matching success with the same number of points per image. Moreover, we embed the prediction in a state-of-the-art Structure-from-Motion pipeline and demonstrate that it also outperforms other selection methods at system level.

computer vision and pattern recognition | 2011

Structure-from-motion based hand-eye calibration using L ∞ minimization

Jan Heller; Michal Havlena; Akihiro Sugimoto; Tomas Pajdla

This paper presents a novel method for so-called hand-eye calibration. Using a calibration target is not possible for many applications of hand-eye calibration. In such situations Structure-from-Motion approach of hand-eye calibration is commonly used to recover the camera poses up to scaling. The presented method takes advantage of recent results in the L∞-norm optimization using Second-Order Cone Programming (SOCP) to recover the correct scale. Further, the correctly scaled displacement of the hand-eye transformation is recovered solely from the image correspondences and robot measurements, and is guaranteed to be globally optimal with respect to the L∞-norm. The method is experimentally validated using both synthetic and real world datasets.

computer vision and pattern recognition | 2016

Large-Scale Location Recognition and the Geometric Burstiness Problem

Torsten Sattler; Michal Havlena; Konrad Schindler; Marc Pollefeys

Visual location recognition is the task of determining the place depicted in a query image from a given database of geo-tagged images. Location recognition is often cast as an image retrieval problem and recent research has almost exclusively focused on improving the chance that a relevant database image is ranked high enough after retrieval. The implicit assumption is that the number of inliers found by spatial verification can be used to distinguish between a related and an unrelated database photo with high precision. In this paper, we show that this assumption does not hold for large datasets due to the appearance of geometric bursts, i.e., sets of visual elements appearing in similar geometric configurations in unrelated database photos. We propose algorithms for detecting and handling geometric bursts. Although conceptually simple, using the proposed weighting schemes dramatically improves the recall that can be achieved when high precision is required compared to the standard re-ranking based on the inlier count. Our approach is easy to implement and can easily be integrated into existing location recognition systems.

computer vision and pattern recognition | 2012

A branch-and-bound algorithm for globally optimal hand-eye calibration

Jan Heller; Michal Havlena; Tomas Pajdla

This paper introduces a novel solution to hand-eye calibration problem. It is the first method that uses camera measurements directly and at the same time requires neither prior knowledge of the external camera calibrations nor a known calibration device. Our algorithm uses branch-and-bound approach to minimize an objective function based on the epipolar constraint. Further, it employs Linear Programming to decide the bounding step of the algorithm. The presented technique is able to recover both the unknown rotation and translation simultaneously and the solution is guaranteed to be globally optimal with respect to the L∞-norm.

pacific-rim symposium on image and video technology | 2009

Omnidirectional Image Stabilization by Computing Camera Trajectory

Akihiko Torii; Michal Havlena; Tomas Pajdla

In this paper we present a pipeline for camera pose and trajectory estimation, and image stabilization and rectification for dense as well as wide baseline omnidirectional images. The input is a set of images taken by a single hand-held camera. The output is a set of stabilized and rectified images augmented by the computed camera 3D trajectory and reconstruction of feature points facilitating visual object recognition. The paper generalizes previous works on camera trajectory estimation done on perspective images to omnidirectional images and introduces a new technique for omnidirectional image rectification that is suited for recognizing people and cars in images. The performance of the pipeline is demonstrated on a real image sequence acquired in urban as well as natural environments.

computer vision and pattern recognition | 2009

AWEAR 2.0 system: Omni-directional audio-visual data acquisition and processing

Michal Havlena; Andreas Ess; Wim Moreau; Akihiko Torii; Michal Jancosek; Tomas Pajdla; Luc Van Gool

We present a wearable audio-visual capturing system, termed AWEAR 2.0, along with its underlying vision components that allow robust self-localization, multi-body pedestrian tracking, and dense scene reconstruction. Designed as a backpack, the system is aimed at supporting the cognitive abilities of the wearer. In this paper, we focus on the design issues for the hardware platform and on the performance of the current state-of-the-art computer vision methods on the acquired sequences. We describe the calibration procedure of the two omni-directional cameras present in the system as well as a structure-from-motion pipeline that allows for stable multi-body tracking even from rather shaky video sequences thanks to ground plane stabilization. Furthermore, we show how a dense scene reconstruction can be obtained from the data acquired with the platform.

international conference on intelligent robotics and applications | 2012

Nao robot localization and navigation using fusion of odometry and visual sensor data

Šimon Fojtů; Michal Havlena; Tomas Pajdla

Nao humanoid robot from Aldebaran Robotics is equipped with an odometry sensor providing rather inaccurate robot pose estimates. We propose using Structure from Motion (SfM) to enable visual odometry from Nao camera without the necessity to add artificial markers to the scene and show that the robot pose estimates can be significantly improved by fusing the data from the odometry sensor and visual odometry. The implementation consists of the sensor modules streaming robot data, the mapping module creating a 3D model, the visual localization module estimating camera pose w.r.t. the model, and the navigation module planning robot trajectories and performing the actual movement. All of the modules are connected through the RSB middleware, which makes the solution independent on the given robot type.

Explore More