Martin Köstinger | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Köstinger is active.

Explore More

Publication

Featured researches published by Martin Köstinger.

computer vision and pattern recognition | 2012

Large scale metric learning from equivalence constraints

Martin Köstinger; Martin Hirzer; Paul Wohlhart; Peter M. Roth; Horst Bischof

In this paper, we raise important issues on scalability and the required degree of supervision of existing Mahalanobis metric learning methods. Often rather tedious optimization procedures are applied that become computationally intractable on a large scale. Further, if one considers the constantly growing amount of data it is often infeasible to specify fully supervised labels for all data points. Instead, it is easier to specify labels in form of equivalence constraints. We introduce a simple though effective strategy to learn a distance metric from equivalence constraints, based on a statistical inference perspective. In contrast to existing methods we do not rely on complex optimization problems requiring computationally expensive iterations. Hence, our method is orders of magnitudes faster than comparable methods. Results on a variety of challenging benchmarks with rather diverse nature demonstrate the power of our method. These include faces in unconstrained environments, matching before unseen object instances and person re-identification across spatially disjoint cameras. In the latter two benchmarks we clearly outperform the state-of-the-art.

european conference on computer vision | 2012

Relaxed pairwise learned metric for person re-identification

Martin Hirzer; Peter M. Roth; Martin Köstinger; Horst Bischof

Matching persons across non-overlapping cameras is a rather challenging task. Thus, successful methods often build on complex feature representations or sophisticated learners. A recent trend to tackle this problem is to use metric learning to find a suitable space for matching samples from different cameras. However, most of these approaches ignore the transition from one camera to the other. In this paper, we propose to learn a metric from pairs of samples from different cameras. In this way, even less sophisticated features describing color and texture information are sufficient for finally getting state-of-the-art classification results. Moreover, once the metric has been learned, only linear projections are necessary at search time, where a simple nearest neighbor classification is performed. The approach is demonstrated on three publicly available datasets of different complexity, where it can be seen that state-of-the-art results can be obtained at much lower computational costs.

international conference on computer vision | 2011

Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization

Martin Köstinger; Paul Wohlhart; Peter M. Roth; Horst Bischof

Face alignment is a crucial step in face recognition tasks. Especially, using landmark localization for geometric face normalization has shown to be very effective, clearly improving the recognition results. However, no adequate databases exist that provide a sufficient number of annotated facial landmarks. The databases are either limited to frontal views, provide only a small number of annotated images or have been acquired under controlled conditions. Hence, we introduce a novel database overcoming these limitations: Annotated Facial Landmarks in the Wild (AFLW). AFLW provides a large-scale collection of images gathered from Flickr, exhibiting a large variety in face appearance (e.g., pose, expression, ethnicity, age, gender) as well as general imaging and environmental conditions. In total 25,993 faces in 21,997 real-world images are annotated with up to 21 landmarks per image. Due to the comprehensive set of annotations AFLW is well suited to train and test algorithms for multi-view face detection, facial landmark localization and face pose estimation. Further, we offer a rich set of tools that ease the integration of other face databases and associated annotations into our joint framework.

Person Re-Identification | 2014

Mahalanobis Distance Learning for Person Re-identification

Peter M. Roth; Martin Hirzer; Martin Köstinger; Csaba Beleznai; Horst Bischof

Recently, Mahalanobis metric learning has gained a considerable interest for single-shot person re-identification. The main idea is to build on an existing image representation and to learn a metric that reflects the visual camera-to-camera transitions, allowing for a more powerful classification. The goal of this chapter is twofold. We first review the main ideas of Mahalanobis metric learning in general and then give a detailed study on different approaches for the task of single-shot person re-identification, also comparing to the state of the art. In particular, for our experiments, we used Linear Discriminant Metric Learning (LDML), Information Theoretic Metric Learning (ITML), Large Margin Nearest Neighbor (LMNN), Large Margin Nearest Neighbor with Rejection (LMNN-R), Efficient Impostor-based Metric Learning (EIML), and KISSME. For our evaluations we used four different publicly available datasets (i.e., VIPeR, ETHZ, PRID 2011, and CAVIAR4REID). Additionally, we generated the new, more realistic PRID 450S dataset, where we also provide detailed segmentations. For the latter one, we also evaluated the influence of using well-segmented foreground and background regions. Finally, the corresponding results are presented and discussed.

international conference on image processing | 2012

Dense appearance modeling and efficient learning of camera transitions for person re-identification

Martin Hirzer; Csaba Beleznai; Martin Köstinger; Peter M. Roth; Horst Bischof

One central task in many visual surveillance scenarios is person re-identification, i.e., recognizing an individual person across a network of spatially disjoint cameras. Most successful recognition approaches are either based on direct modeling of the human appearance or on machine learning. In this work, we aim at taking advantage of both directions of research. On the one hand side, we compute a descriptive appearance representation encoding the vertical color structure of pedestrians. To improve the classification results, we additionally estimate the transition between two cameras using a pair-wisely estimated metric. In particular, we introduce 4D spatial color histograms and adopt Large Margin Nearest Neighbor (LMNN) metric learning. The approach is demonstrated for two publicly available datasets, showing competitive results, however, on lower computational costs.

british machine vision conference | 2012

Discriminative Hough Forests for Object Detection

Paul Wohlhart; Samuel Schulter; Martin Köstinger; Peter M. Roth; Horst Bischof

Object detection models based on the Implicit Shape Model (ISM) [3] use small, local parts that vote for object centers in images. Since these parts vote completely independently from each other, this often leads to false-positive detections due to random constellations of parts. Thus, we introduce a verification step, which considers the activations of all voting elements that contribute to a detection. The levels of activation of each voting element of the ISM form a new description vector for an object hypothesis, which can be examined in order to discriminate between correct and incorrect detections. In particular, we observe the levels of activation of the voting elements in Hough Forests [2], which can be seen as a variant of ISM. In Hough Forests, the voting elements are all the positive training patches used to train the Forest. Each patch of the input image is classified by all decision trees in the Hough Forest. Whenever an input patch falls into the same leaf node as a patch from training, a certain amount of weight is added to the detection hypothesis at the relative position of the object center, which was recorded when cropping out the training patch. The total amount of weight one voting element (offset vector) adds to a detection hypothesis (the total activation) can be calculated by summing over all input patches and trees in the forest. Stacking the activations of all elements gives an activation vector for a hypothesis. We learn classifiers to discriminate correct and wrong part constellations based on these activation vectors and thus assign a better confidence to each detection. We use linear models as well as a histogram intersection kernel SVM. In the linear classifier, one weight is learned for each voting element. We additionally show how to use these weights, not only as a post processing step, but directly in the voting process. This has two advantages: First, it circumvents the explicit calculation of the activation vector for later reclassification, which is computationally more demanding. Second, the non-maxima suppression is performed on cleaner Hough maps, which allows for reducing the size of the suppression neighborhood and thus increases the recall at high levels of precision.

advanced video and signal based surveillance | 2011

Learning to recognize faces from videos and weakly related information cues

Martin Köstinger; Paul Wohlhart; Peter M. Roth; Horst Bischof

Videos are often associated with additional information that could be valuable for interpretation of their content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we take advantage of semi-supervised (SSL) and multiple instance learning (MIL) and propose a new semi-supervised multiple instance learning (SSMIL) algorithm. Thus, during training we can weaken the prerequisite of knowing the label for each instance and can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset. In fact, we show exploring new information sources can considerably improve the classification results.

international conference on pattern recognition | 2011

Multiple instance boosting for face recognition in videos

Paul Wohlhart; Martin Köstinger; Peter M. Roth; Horst Bischof

For face recognition from video streams often cues such as transcripts, subtitles or on-screen text are available. This information could be very valuable for improving the recognition performance. However, frequently this data can not be associated directly with just one of the visible faces. To overcome this limitations and to exploit valuable information, we define the task as a multiple instance learning (MIL) problem. We formulate a robust loss function that describes our problem and incorporates ambiguous and unreliable information sources and optimize it using Gradient Boosting. A new definition of the posterior probability of a bag, based on the Lp-norm, improves the ability to deal with varying bag sizes over existing formulations. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset. In fact, we show that exploring new information sources can drastically improve the classification results. Additionally, we show its competitive performance on standard machine learning datasets.

international conference on computer vision | 2013

Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers

Martin Köstinger; Paul Wohlhart; Peter M. Roth; Horst Bischof

In this paper, we raise important issues concerning the evaluation complexity of existing Mahalanobis metric learning methods. The complexity scales linearly with the size of the dataset. This is especially cumbersome on large scale or for real-time applications with limited time budget. To alleviate this problem we propose to represent the dataset by a fixed number of discriminative prototypes. In particular, we introduce a new method that jointly chooses the positioning of prototypes and also optimizes the Mahalanobis distance metric with respect to these. We show that choosing the positioning of the prototypes and learning the metric in parallel leads to a drastically reduced evaluation effort while maintaining the discriminative essence of the original dataset. Moreover, for most problems our method performing k-nearest prototype (k-NP) classification on the condensed dataset leads to even better generalization compared to k-NN classification using all data. Results on a variety of challenging benchmarks demonstrate the power of our method. These include standard machine learning datasets as well as the challenging Public Figures Face Database. On the competitive machine learning benchmarks we are comparable to the state-of-the-art while being more efficient. On the face benchmark we clearly outperform the state-of-the-art in Mahalanobis metric learning with drastically reduced evaluation effort.

computer vision and pattern recognition | 2013

Optimizing 1-Nearest Prototype Classifiers

Paul Wohlhart; Martin Köstinger; Michael Donoser; Peter M. Roth; Horst Bischof

The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise over fitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.

Explore More