Is this you? Create Your Porfile

Arne Schumann

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arne Schumann is active.

Explore More

Publication

Featured researches published by Arne Schumann.

computer vision and pattern recognition | 2017

Person Re-identification by Deep Learning Attribute-Complementary Information

Arne Schumann; Rainer Stiefelhagen

Automatic person re-identification (re-id) across camera boundaries is a challenging problem. Approaches have to be robust against many factors which influence the visual appearance of a person but are not relevant to the persons identity. Examples for such factors are pose, camera angles, and lighting conditions. Person attributes are a semantic high level information which is invariant across many such influences and contain information which is often highly relevant to a persons identity. In this work we develop a re-id approach which leverages the information contained in automatically detected attributes. We train an attribute classifier on separate data and include its responses into the training process of our person re-id model which is based on convolutional neural networks (CNNs). This allows us to learn a person representation which contains information complementary to that contained within the attributes. Our approach is able to identify attributes which perform most reliably for re-id and focus on them accordingly. We demonstrate the performance improvement gained through use of the attribute information on multiple large-scale datasets and report insights into which attributes are most relevant for person re-id.

advanced video and signal based surveillance | 2013

Person tracking-by-detection with efficient selection of part-detectors

Arne Schumann; Martin Bäuml; Rainer Stiefelhagen

In this paper we introduce a new person tracking-by-detection approach based on a particle filter. We leverage detection and appearance cues and apply explicit occlusion reasoning. The approach samples efficiently from a large set of available person part-detectors in order to increase runtime performance while retaining accuracy. The tracking approach is evaluated and compared to the state of the art on the CAVIAR surveillance dataset as well as on a multimedia dataset consisting of six episodes of the TV series The Big Bang Theory. The results demonstrate the versatility of the approach on very different types of data and its robustness to camera movement and non-pedestrian body poses.

advanced video and signal based surveillance | 2015

Transferring attributes for person re-identification

Arne Schumann; Rainer Stiefelhagen

Person re-identification is an important computer vision task with many applications in areas such as surveillance or multimedia. Approaches relying on handcrafted image features struggle with many factors (e.g. lighting, camera angle) which lead to a large variety in visual appearance for the same individual. Features based on semantic attributes of a persons appearance can help with some of these challenges. In this work we describe an approach that integrates such attributes with existing re-identification methods based on low-level features. We start by training a set of attribute classifiers and present a metric learning approach that uses these attributes for person re-identification. The method is then applied to a second dataset without attributes labels by transferring the attributes classifiers. Performance on the target dataset can be increased by applying a whitening transformation prior to transfer. We present experiments on publicly available datasets and demonstrate the performance improvement gained by this added re-identification cue.

Automatic Target Recognition XXVIII | 2018

Systematic evaluation of deep learning based detection frameworks for aerial imagery

Lars Wilko Sommer; Arne Schumann; Lucas Steinmann; Jürgen Beyerer

Object detection in aerial imagery is crucial for many applications in the civil and military domain. In recent years, deep learning based object detection frameworks significantly outperformed conventional approaches based on hand-crafted features on several datasets. However, these detection frameworks are generally designed and optimized for common benchmark datasets, which considerably differ from aerial imagery especially in object sizes. As already demonstrated for Faster R-CNN, several adaptations are necessary to account for these differences. In this work, we adapt several state-of-the-art detection frameworks including Faster R-CNN, R-FCN, and Single Shot MultiBox Detector (SSD) to aerial imagery. We discuss adaptations that mainly improve the detection accuracy of all frameworks in detail. As the output of deeper convolutional layers comprise more semantic information, these layers are generally used in detection frameworks as feature map to locate and classify objects. However, the resolution of these feature maps is insufficient for handling small object instances, which results in an inaccurate localization or incorrect classification of small objects. Furthermore, state-of-the-art detection frameworks perform bounding box regression to predict the exact object location. Therefore, so called anchor or default boxes are used as reference. We demonstrate how an appropriate choice of anchor box sizes can considerably improve detection performance. Furthermore, we evaluate the impact of the performed adaptations on two publicly available datasets to account for various ground sampling distances or differing backgrounds. The presented adaptations can be used as guideline for further datasets or detection frameworks.

advanced video and signal based surveillance | 2017

Flying object detection for automatic UAV recognition

Lars Wilko Sommer; Arne Schumann; Thomas Müller; Tobias Schuchert; Jürgen Beyerer

With the increasing use of unmanned aerial vehicles (UAVs) by consumers, automatic UAV detection systems have become increasingly important for security services. In such a system, video imagery is a core modality for the detection task, because it can cover large areas and is very cost-effective to acquire. Many detection systems consist of two parts: flying object detection and subsequent object classification. In this work, we investigate the suitability of a number of flying object detection approaches for the task of UAV detection based on video data from static and moving cameras. We compare approaches based on image differencing with object proposal detectors which are learned from data. Finally, we classify each detection by a convolutional neural network (CNN) into the classes UAV or clutter. Our approach is evaluated on six sequences of challenging real world data which contain multiple UAVs, birds, and background motion.

advanced video and signal based surveillance | 2017

Deep cross-domain flying object classification for robust UAV detection

Arne Schumann; Lars Wilko Sommer; Johannes Klatte; Tobias Schuchert; Jürgen Beyerer

Recent progress in the development of unmanned aerial vehicles (UAVs) causes serious safety issues for mass events and safety-sensitive locations like prisons or airports. To address these concerns, robust UAV detection systems are required. In this work, we propose an UAV detection framework based on video images. Depending on whether the video images are recorded by static cameras or moving cameras, we initially detect regions that are likely to contain an object by median background subtraction or a deep learning based object proposal method, respectively. Then, the detected regions are classified into UAV or distractors, such as birds, by applying a convolutional neural network (CNN) classifier. To train this classifier, we use our own dataset comprised of crawled and self-acquired drone images, as well as bird images from a publicly available dataset. We show that, even across a significant domain gap, the resulting classifier can successfully identify UAVs in our target dataset. We evaluate our UAV detection framework on six challenging video sequences that contain UAVs at different distances as well as birds and background motion.

advanced video and signal based surveillance | 2012

Contextual Constraints for Person Retrieval in Camera Networks

Martin Bäuml; Makarand Tapaswi; Arne Schumann; Rainer Stiefelhagen

We use contextual constraints for person retrieval in camera networks. We start by formulating a set of general positive and negative constraints on the identities of person tracks in camera networks, such as a person cannot appear twice in the same frame. We then show how these constraints can be used to improve person retrieval. First, we use the constraints to obtain training data in an unsupervised way to learn a general metric that is better suited to discriminate between different people than the Euclidean distance. Second, starting from an initial query track, we enhance the query-set using the constraints to obtain additional positive and negative samples for the query. Third, we formulate the person retrieval task as an energy minimization problem, integrate track scores and constraints in a common framework and jointly optimize the retrieval over all interconnected tracks. We evaluate our approach on the CAVIAR dataset and achieve 22% relative performance improvement in terms of mean average precision over standard retrieval where each track is treated independently.

workshop on applications of computer vision | 2018

Multi Feature Deconvolutional Faster R-CNN for Precise Vehicle Detection in Aerial Imagery

Lars Wilko Sommer; Arne Schumann; Tobias Schuchert; Jürgen Beyerer

Accurate detection of objects in aerial images is an important task for many applications such as traffic monitoring, surveillance, reconnaissance and rescue tasks. Recently, deep learning based detection frameworks clearly improved the detection performance on aerial images compared to conventional methods comprised of hand-crafted features and a classifier within a sliding window approach. These deep learning based detection frameworks use the output of the last convolutional layer as feature map for localization and classification. Due to the small size of objects in aerial images, only shallow layers of standard models like VGG-16 or small networks are applicable in order to provide a sufficiently high feature map resolution. However, high-resolution feature maps offer less semantic and contextual information, which results in approaches being more prone to false alarms due to objects with similar shapes especially in case of tiny objects. In this paper, we extend the Faster R-CNN detection framework to cope this issue. Therefore, we apply a deconvolutional module that up-samples low-dimensional feature maps of deep layers and combines the up-sampled features with the features of shallow layers while the feature map resolution is kept sufficiently high to localize tiny objects. Our proposed deconvolutional framework clearly outperforms state-of-the-art methods on two publicly available datasets.

arXiv: Computer Vision and Pattern Recognition | 2018

A systematic evaluation of recent deep learning architectures for fine-grained vehicle classification

Krassimir Valev; Arne Schumann; Lars Wilko Sommer; Jürgen Beyerer

Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in combination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections.

Emerging Imaging and Sensing Technologies for Security and Defence III; and Unmanned Sensors, Systems, and Countermeasures | 2018

An image processing pipeline for long range UAV detection

Arne Schumann; Lars Wilko Sommer; Thomas Müller; Sascha Voth

The number of affordable consumer unmanned aerial vehicles (UAVs) available on the market has been growing quickly in recent years. Uncontrolled use of such UAVs in the context of public events like sports events or demonstrations, as well as their use near sensitive areas, such as airports or correctional facilities pose a potential security threat. Automatic early detection of UAVs is thus an important task which can be addressed through multiple modalities, such as visual imagery, radar, audio signals, or UAV control signals. In this work we present an image processing pipeline which is capable of tracking very small point targets in an overview camera, adjusting a tilting unit with a mounted zoom camera (PTZ system) to locations of interest and classifying the spotted object in this more detailed camera view. The overview camera is a high-resolution camera with a wide field of view. Its main purpose is to monitor a wide area and to allow an early detection of candidates, whose motion or appearance warrant a closer investigation. In a subsequent process these candidates are prioritized and successively examined by adapting the orientation of the tilting unit and the zoom level of the attached camera lens, to be able to observe the target in detail and provide appropriate data for the classification stage. The image of the PTZ camera is then used to classify the object into either UAV class or distractor class. For this task we apply the popular SSD detector. Several parameters of the detector have been adapted for the task of UAV detection and classification. We demonstrate the performance of the full pipeline on imagery collected by the system. The data contains actual UAVs as well as distractors, such as birds.

Explore More