Michael D. Breitenstein
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael D. Breitenstein.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Michael D. Breitenstein; Fabian Reichlin; Bastian Leibe; Esther Koller-Meier; L. Van Gool
In this paper, we address the problem of automatically detecting and tracking a variable number of persons in complex scenes using a monocular, potentially moving, uncalibrated camera. We propose a novel approach for multiperson tracking-by-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online-trained, instance-specific classifiers as a graded observation model. Thus, generic object category knowledge is complemented by instance-specific information. The main contribution of this paper is to explore how these unreliable information sources can be used for robust multiperson tracking. The algorithm detects and tracks a large number of dynamically moving people in complex scenes with occlusions, does not rely on background modeling, requires no camera or ground plane calibration, and only makes use of information from the past. Hence, it imposes very few restrictions and is suitable for online applications. Our experiments show that the method yields good tracking performance in a large variety of highly dynamic scenarios, such as typical surveillance videos, webcam footage, or sports sequences. We demonstrate that our algorithm outperforms other methods that rely on additional information. Furthermore, we analyze the influence of different algorithm components on the robustness.
international conference on computer vision | 2009
Michael D. Breitenstein; Fabian Reichlin; Bastian Leibe; Esther Koller-Meier; Luc Van Gool
We propose a novel approach for multi-person tracking-by-detection in a particle filtering framework. In addition to final high-confidence detections, our algorithm uses the continuous confidence of pedestrian detectors and online trained, instance-specific classifiers as a graded observation model. Thus, generic object category knowledge is complemented by instance-specific information. A main contribution of this paper is the exploration of how these unreliable information sources can be used for multi-person tracking. The resulting algorithm robustly tracks a large number of dynamically moving persons in complex scenes with occlusions, does not rely on background modeling, and operates entirely in 2D (requiring no camera or ground plane calibration). Our Markovian approach relies only on information from the past and is suitable for online applications. We evaluate the performance on a variety of datasets and show that it improves upon state-of-the-art methods.
computer vision and pattern recognition | 2010
Daniel Kuettel; Michael D. Breitenstein; Luc Van Gool; Vittorio Ferrari
We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we employ Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models. In contrast to previous work, we build on state-of-the-art topic models that allow to automatically infer all parameters such as the optimal number of HMMs necessary to explain the rules governing a scene. The models are trained offline by Gibbs Sampling using unlabeled training data.
computer vision and pattern recognition | 2008
Michael D. Breitenstein; Daniel Kuettel; Thibaut Weise; L. Van Gool; Hanspeter Pfister
We present a real-time algorithm to estimate the 3D pose of a previously unseen face from a single range image. Based on a novel shape signature to identify noses in range images, we generate candidates for their positions, and then generate and evaluate many pose hypotheses in parallel using modern graphics processing units (GPUs). We developed a novel error function that compares the input range image to precomputed pose images of an average face model. The algorithm is robust to large pose variations of plusmn90deg yaw, plusmn45deg pitch and plusmn30deg roll rotation, facial expression, partial occlusion, and works for multiple faces in the field of view. It correctly estimates 97.8% of the poses within yaw and pitch error of 15deg at 55.8 fps. To evaluate the algorithm, we built a database of range images with large pose variations and developed a method for automatic ground truth annotation.
international conference on computer vision | 2009
Michael D. Breitenstein; Helmut Grabner; Luc Van Gool
We present a data-driven, unsupervised method for unusual scene detection from static webcams. Such time-lapse data is usually captured with very low or varying framerate. This precludes the use of tools typically used in surveillance (e.g., object tracking). Hence, our algorithm is based on simple image features. We define usual scenes based on the concept of meaningful nearest neighbours instead of building explicit models. To effectively compare the observations, our algorithm adapts the data representation. Furthermore, we use incremental learning techniques to adapt to changes in the data-stream. Experiments on several months of webcam data show that our approach detects plausible unusual scenes, which have not been observed in the data-stream before.
machine vision applications | 2010
In Kyu Park; Marcel Germann; Michael D. Breitenstein; Hanspeter Pfister
We present a pose estimation method for rigid objects from single range images. Using 3D models of the objects, many pose hypotheses are compared in a data-parallel version of the downhill simplex algorithm with an image-based error function. The pose hypothesis with the lowest error value yields the pose estimation (location and orientation), which is refined using ICP. The algorithm is designed especially for implementation on the GPU. It is completely automatic, fast, robust to occlusion and cluttered scenes, and scales with the number of different object types. We apply the system to bin picking, and evaluate it on cluttered scenes. Comprehensive experiments on challenging synthetic and real-world data demonstrate the effectiveness of our method.
digital identity management | 2007
Marcel Germann; Michael D. Breitenstein; Hanspeter Pfister; In Kyu Park
Object pose (location and orientation) estimation is a common task in many computer vision applications. Although many methods exist, most algorithms need manual initialization and lack robustness to illumination variation, appearance change, and partial occlusions. We propose a fast method for automatic pose estimation without manual initialization based on shape matching of a 3D model to a range image of the scene. We developed a new error function to compare the input range image to pre-computed range maps of the 3D model. We use the tremendous data- parallel processing performance of modern graphics hardware to evaluate and minimize the error function on many range images in parallel. Our algorithm is simple and accurately estimates the pose of partially occluded objects in cluttered scenes in about one second.
conference on image and video retrieval | 2009
Luc Van Gool; Michael D. Breitenstein; Stephan Gammeter; Helmut Grabner; Till Quack
So far, most image mining was based on interactive querying. Although such querying will remain important in the future, several applications need image mining at such wide scales that it has to run automatically. This adds an additional level to the problem, namely to apply appropriate further processing to different types of images, and to decide on such processing automatically as well. This paper touches on those issues in that we discuss the processing of landmark images and of images coming from webcams. The first part deals with the automated collection of images of landmarks, which are then also automatically annotated and enriched with Wikipedia information. The target application is that users photograph landmarks with their mobile phones or PDAs, and automatically get information about them. Similarly, users can get images in their photo albums annotated automatically. The object of interest can also be automatically delineated in the images. The pipeline we propose actually retrieves more images than manual keyword input would produce. The second part of the paper deals with an entirely different source of image data, but one that also produces massive amounts (although typically not archived): webcams. They produce images at a single location, but rather continuously and over extended periods of time. We propose an approach to summarize data coming from webcams. This data handling is quite different from that applied to the landmark images.
british machine vision conference | 2008
Michael D. Breitenstein; Eric Sommerlade; Bastian Leibe; Luc Van Gool; Ian D. Reid
We present an online learning approach for robustly combining unreliable observations from a pedestrian detector to estimate the rough 3D scene geometry from video sequences of a static camera. Our approach is based on an entropy modelling framework, which allows to simultaneously adapt the detector parameters, such that the expected information gain about the scene structure is maximised. As a result, our approach automatically restricts the detector scale range for each image region as the estimation results become more confident, thus improving detector run-time and limiting false positives.
scandinavian conference on image analysis | 2009
Michael D. Breitenstein; Jeppe Jensen; Carsten Høilund; Thomas B. Moeslund; Luc Van Gool
We present an algorithm to estimate the 3D pose (location and orientation) of a previously unseen face from low-quality range images. The algorithm generates many pose candidates from a signature to find the nose tip based on local shape, and then evaluates each candidate by computing an error function. Our algorithm incorporates 2D and 3D cues to make the system robust to low-quality range images acquired by passive stereo systems. It handles large pose variations (of ±90 ° yaw and ±45 ° pitch rotation) and facial variations due to expressions or accessories. For a maximally allowed error of 30°, the system achieves an accuracy of 83.6%.