Is this you? Create Your Porfile

Stepán Obdrzálek

Czech Technical University in Prague

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stepán Obdrzálek is active.

Explore More

Publication

Featured researches published by Stepán Obdrzálek.

british machine vision conference | 2002

Object Recognition using Local Affine Frames on Distinguished Regions

Stepán Obdrzálek; Jiri Matas

A novel approach to appearance based object recognition is introduced. The proposed method, based on matching of local image features, reliably recognises objects under very different viewing conditions. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. The potential of the approach is experimentally verified on COIL-100 and SOIL-47 ‐ publicly available image databases. On SOIL-47, 100% recognition rate is achieved for single training view per object. On COIL-100, 99.9% recognition rate is obtained for 18 training views per object. Robustness to severe occlusions is demonstrated by only a moderate decrease of recognition performance in an experiment where half of each test image is erased.

british machine vision conference | 2005

Sub-linear Indexing for Large Scale Object Recognition.

Stepán Obdrzálek; Jiri Matas

Realistic approaches to large scale object recognition, i.e. for detection and localisation of hundreds or more objects, must support sub-linear time indexing. In the paper, we propose a method capable of recognising one of N objects in log(N) time. The ”visual memory” is organised as a binary decision tree that is built to minimise average time to decision. Leaves of the tree represent a few local image areas, and each non-terminal node is associated with a ’weak classifier’. In the recognition phase, a single invariant measurement decides in which subtree a corresponding image area is sought. The method preserves all the strengths of local affine region methods – robustness to background clutter, occlusion, and large changes of viewpoints. Experimentally we show that it supports near real-time recognition of hundreds of objects with state-of-the-art recognition rates. After the test image is processed (in a second on a current PCs), the recognition via indexing into the visual memory requires milliseconds.

joint pattern recognition symposium | 2003

Image Retrieval Using Local Compact DCT-Based Representation

Stepán Obdrzálek; Jiri Matas

An image retrieval system based on local affine frames is introduced. The system provides highly discriminative retrieval of rigid objects under a very wide range of viewing and illumination conditions, and is robust to occlusion and background clutter. Distinguished regions of data dependent shape are detected, and local affine frames (coordinate systems) are obtained. Photometrically and geometrically normalised image patches are extracted and used for matching.

ieee intelligent vehicles symposium | 2010

A voting strategy for visual ego-motion from stereo

Stepán Obdrzálek; Jiri Matas

We present a procedure for egomotion estimation from visual input of a stereo pair of video cameras. The 3D egomotion problem, which has six degrees of freedom in general, is simplified to four dimensions and further decomposed to two two-dimensional subproblems. The decomposition allows us to use a voting strategy to identify the most probable solution, avoiding the random sampling (RANSAC) or other approximation techniques. The input constitutes of image correspondences between consecutive stereo pairs, i.e. feature points do not need to be tracked over time. The experiments show that even if a trajectory is put together as a simple concatenation of frame-to-frame increments, it comes out reliable and precise.

workshop on applications of computer vision | 2017

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects

Tomas Hodan; Pavel Haluza; Stepán Obdrzálek; Jiri Matas; Manolis I. A. Lourakis; Xenophon Zabulis

We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp:felk:cvut:cz/t-less.

international conference on computer vision | 2007

Stable Affine Frames on Isophotes

Michal Perdoch; Jiri Matas; Stepán Obdrzálek

We propose a new affine-covariant feature, the stable affine frame (SAF). SAFs lie on the boundary of extremal regions, i.e. on isophotes. Instead of requiring the whole isophote to be stable with respect to intensity perturbation as in maximally stable extremal regions (MSERs), stability is required only locally, for the primitives constituting the three-point frames. The primitives are extracted by an affine invariant process that exploits properties of bitangents and algebraic moments. Thus, instead of using closed stable isophotes, i.e. MSERs, and detecting affine frames on them, SAFs are sought even on some unstable extremal regions. We show experimentally on standard datasets that SAFs have repeatability comparable to the best affine covariant detectors tested in the state-of-the-art report (Mikolajczyk et al., 2005) and consistently produce a significantly higher number of features per image. Moreover, the features cover images more evenly than MSERs, which facilitates robustness to occlusion.

Computer Vision and Image Understanding | 2009

Integrated vision system for the semantic interpretation of activities where a person handles objects

Markus Vincze; Michael Zillich; Wolfgang Ponweiser; Václav Hlaváč; Jiri Matas; Stepán Obdrzálek; Hilary Buxton; A. Jonathan Howell; Kingsley Sage; Antonis A. Argyros; Christof Eberst; Gerald Umgeher

Interpretation of human activity is primarily known from surveillance and video analysis tasks and concerned with the persons alone. In this paper we present an integrated system that gives a natural language interpretation of activities where a person handles objects. The system integrates low-level image components such as hand and object tracking, detection and recognition, with high-level processes such as spatio-temporal object relationship generation, posture and gesture recognition, and activity reasoning. A task-oriented approach focuses processing to achieve near real-time and to react depending on the situation context.

intelligent robots and systems | 2015

Detection and fine 3D pose estimation of texture-less objects in RGB-D images

Tomas Hodan; Xenophon Zabulis; Manolis I. A. Lourakis; Stepán Obdrzálek; Jiri Matas

Despite their ubiquitous presence, texture-less objects present significant challenges to contemporary visual object detection and localization algorithms. This paper proposes a practical method for the detection and accurate 3D localization of multiple texture-less and rigid objects depicted in RGB-D images. The detection procedure adopts the sliding window paradigm, with an efficient cascade-style evaluation of each window location. A simple pre-filtering is performed first, rapidly rejecting most locations. For each remaining location, a set of candidate templates (i.e. trained object views) is identified with a voting procedure based on hashing, which makes the methods computational complexity largely unaffected by the total number of known objects. The candidate templates are then verified by matching feature points in different modalities. Finally, the approximate object pose associated with each detected template is used as a starting point for a stochastic optimization procedure that estimates accurate 3D pose. Experimental evaluation shows that the proposed method yields a recognition rate comparable to the state of the art, while its complexity is sub-linear in the number of templates.

computer vision and pattern recognition | 2008

Dense linear-time correspondences for tracking

Stepán Obdrzálek; Michal Perdoch; Jiri Matas

A novel method is proposed for the problem of frame-to-frame correspondence search in video sequences. The method, based on hashing of low-dimensional image descriptors, establishes dense correspondences and allows large motions. All image pixels are considered for matching, the notion of interest points is reviewed. In our formulation, points of interest are those that can be reliably matched. Their saliency depends on properties of the chosen matching function and on actual image content. Both computational time and memory requirements of the correspondence search are asymptotically linear in the number of image pixels, irrespective of correspondence density and of image content. All steps of the method are simple and allow for a hardware implementation. Functionality is demonstrated on sequences taken from a vehicle moving in an urban environment.

asian conference on computer vision | 2004