Martin Hirzer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Hirzer is active.

Explore More

Publication

Featured researches published by Martin Hirzer.

computer vision and pattern recognition | 2012

Large scale metric learning from equivalence constraints

Martin Köstinger; Martin Hirzer; Paul Wohlhart; Peter M. Roth; Horst Bischof

In this paper, we raise important issues on scalability and the required degree of supervision of existing Mahalanobis metric learning methods. Often rather tedious optimization procedures are applied that become computationally intractable on a large scale. Further, if one considers the constantly growing amount of data it is often infeasible to specify fully supervised labels for all data points. Instead, it is easier to specify labels in form of equivalence constraints. We introduce a simple though effective strategy to learn a distance metric from equivalence constraints, based on a statistical inference perspective. In contrast to existing methods we do not rely on complex optimization problems requiring computationally expensive iterations. Hence, our method is orders of magnitudes faster than comparable methods. Results on a variety of challenging benchmarks with rather diverse nature demonstrate the power of our method. These include faces in unconstrained environments, matching before unseen object instances and person re-identification across spatially disjoint cameras. In the latter two benchmarks we clearly outperform the state-of-the-art.

scandinavian conference on image analysis | 2011

Person re-identification by descriptive and discriminative classification

Martin Hirzer; Csaba Beleznai; Peter M. Roth; Horst Bischof

Person re-identification, i.e., recognizing a single person across spatially disjoint cameras, is an important task in visual surveillance. Existing approaches either try to find a suitable description of the appearance or learn a discriminative model. Since these different representational strategies capture a large extent of complementary information we propose to combine both approaches. First, given a specific query, we rank all samples according to a feature-based similarity, where appearance is modeled by a set of region covariance descriptors. Next, a discriminative model is learned using boosting for feature selection, which provides a more specific classifier. The proposed approach is demonstrated on two datasets, where we show that the combination of a generic descriptive statistical model and a discriminatively learned feature-based model attains considerably better results than the individual models alone. In addition, we give a comparison to the state-of-the-art on a publicly available benchmark dataset.

european conference on computer vision | 2012

Relaxed pairwise learned metric for person re-identification

Martin Hirzer; Peter M. Roth; Martin Köstinger; Horst Bischof

Matching persons across non-overlapping cameras is a rather challenging task. Thus, successful methods often build on complex feature representations or sophisticated learners. A recent trend to tackle this problem is to use metric learning to find a suitable space for matching samples from different cameras. However, most of these approaches ignore the transition from one camera to the other. In this paper, we propose to learn a metric from pairs of samples from different cameras. In this way, even less sophisticated features describing color and texture information are sufficient for finally getting state-of-the-art classification results. Moreover, once the metric has been learned, only linear projections are necessary at search time, where a simple nearest neighbor classification is performed. The approach is demonstrated on three publicly available datasets of different complexity, where it can be seen that state-of-the-art results can be obtained at much lower computational costs.

international conference on computer vision | 2009

Saliency driven total variation segmentation

Michael Donoser; Martin Urschler; Martin Hirzer; Horst Bischof

This paper introduces an unsupervised color segmentation method. The underlying idea is to segment the input image several times, each time focussing on a different salient part of the image and to subsequently merge all obtained results into one composite segmentation. We identify salient parts of the image by applying affinity propagation clustering to efficiently calculated local color and texture models. Each salient region then serves as an independent initialization for a figure/ground segmentation. Segmentation is done by minimizing a convex energy functional based on weighted total variation leading to a global optimal solution. Each salient region provides an accurate figure/ ground segmentation highlighting different parts of the image. These highly redundant results are combined into one composite segmentation by analyzing local segmentation certainty. Our formulation is quite general, and other salient region detection algorithms in combination with any semi-supervised figure/ground segmentation approach can be used. We demonstrate the high quality of our method on the well-known Berkeley segmentation database. Furthermore we show that our method can be used to provide good spatial support for recognition frameworks.

Person Re-Identification | 2014

Mahalanobis Distance Learning for Person Re-identification

Peter M. Roth; Martin Hirzer; Martin Köstinger; Csaba Beleznai; Horst Bischof

Recently, Mahalanobis metric learning has gained a considerable interest for single-shot person re-identification. The main idea is to build on an existing image representation and to learn a metric that reflects the visual camera-to-camera transitions, allowing for a more powerful classification. The goal of this chapter is twofold. We first review the main ideas of Mahalanobis metric learning in general and then give a detailed study on different approaches for the task of single-shot person re-identification, also comparing to the state of the art. In particular, for our experiments, we used Linear Discriminant Metric Learning (LDML), Information Theoretic Metric Learning (ITML), Large Margin Nearest Neighbor (LMNN), Large Margin Nearest Neighbor with Rejection (LMNN-R), Efficient Impostor-based Metric Learning (EIML), and KISSME. For our evaluations we used four different publicly available datasets (i.e., VIPeR, ETHZ, PRID 2011, and CAVIAR4REID). Additionally, we generated the new, more realistic PRID 450S dataset, where we also provide detailed segmentations. For the latter one, we also evaluated the influence of using well-segmented foreground and background regions. Finally, the corresponding results are presented and discussed.

advanced video and signal based surveillance | 2012

Person Re-identification by Efficient Impostor-Based Metric Learning

Martin Hirzer; Peter M. Roth; Horst Bischof

Recognizing persons over a system of disjunct cameras is a hard task for human operators and even harder for automated systems. In particular, realistic setups show difficulties such as different camera angles or different camera properties. Additionally, also the appearance of exactly the same person can change dramatically due to different views (e.g., frontal/back) of carried objects. In this paper, we mainly address the first problem by learning the transition from one camera to the other. This is realized by learning a Mahalanobis metric using pairs of labeled samples from different cameras. Building on the ideas of Large Margin Nearest Neighbor classification, we obtain a more efficient solution which additionally provides much better generalization properties. To demonstrate these benefits, we run experiments on three different publicly available datasets, showing state-of-the-art or even better results, however, on much lower computational efforts. This is in particular interesting since we use quite simple color and texture features, whereas other approaches build on rather complex image descriptions!

international conference on image processing | 2012

Dense appearance modeling and efficient learning of camera transitions for person re-identification

Martin Hirzer; Csaba Beleznai; Martin Köstinger; Peter M. Roth; Horst Bischof

One central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. Most successful recognition approaches are either based on direct modeling of the human appearance or on machine learning. In this work, we aim at taking advantage of both directions of research. On the one hand side, we compute a descriptive appearance representation encoding the vertical color structure of pedestrians. To improve the classification results, we additionally estimate the transition between two cameras using a pair-wisely estimated metric. In particular, we introduce 4D spatial color histograms and adopt Large Margin Nearest Neighbor (LMNN) metric learning. The approach is demonstrated for two publicly available datasets, showing competitive results, however, on lower computational costs.

international conference on computer vision | 2011

Multi-cue learning and visualization of unusual events

René Schuster; Samuel Schulter; Georg Poier; Martin Hirzer; Josef Alois Birchbauer; Peter M. Roth; Horst Bischof; Martin Winter; Peter Schallauer

Unusual event detection, i.e., identifying unspecified rare/critical events, has become one of the major challenges in visual surveillance. The main solution for this problem is to describe local or global normalness and to report events that do not fit to the estimated models. The majority of existing approaches, however, is limited to a single description (e.g., either appearance or motion) and/or builds on inflexible (unsupervised) learning techniques, both clearly degrading the practical applicability. To overcome these limitations, we demonstrate a system that is capable of extracting and modeling several representations in parallel, while in addition allows for user interaction within a continuous learning setup. Novel yet intuitive concepts of result visualization and user interaction will be presented that allow for exploiting the underlying data.

urban remote sensing joint event | 2017

Semantic segmentation for 3D localization in urban environments

Anil Armagan; Martin Hirzer; Vincent Lepetit

We show how to use simple 2.5D maps of buildings and recent advances in image segmentation and machine learning to geo-localize an input image of an urban scene: We first extract the façades of the buildings and their edges from the image, and then look for the orientation and location that align a 3D rendering of the map with these segments. We discuss how to use a 3D tracking system to acquire the data required for training the segmentation method, the segmentation itself, and how we use the segmentations to evaluate the quality of the alignment.

computer vision and pattern recognition | 2017

Learning to Align Semantic Segmentation and 2.5D Maps for Geolocalization

Anil Armagan; Martin Hirzer; Peter M. Roth; Vincent Lepetit

We present an efficient method for geolocalization in urban environments starting from a coarse estimate of the location provided by a GPS and using a simple untextured 2.5D model of the surrounding buildings. Our key contribution is a novel efficient and robust method to optimize the pose: We train a Deep Network to predict the best direction to improve a pose estimate, given a semantic segmentation of the input image and a rendering of the buildings from this estimate. We then iteratively apply this CNN until converging to a good pose. This approach avoids the use of reference images of the surroundings, which are difficult to acquire and match, while 2.5D models are broadly available. We can therefore apply it to places unseen during training.

Explore More