Rene Grzeszick
Technical University of Dortmund
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rene Grzeszick.
international conference on acoustics, speech, and signal processing | 2014
Axel Plinge; Rene Grzeszick; Gernot A. Fink
The classification of acoustic events in indoor environments is an important task for many practical applications in smart environments. In this paper a novel approach for classifying acoustic events that is based on a Bag-of-Features approach is proposed. Mel and gammatone frequency cepstral coefficients that originate from psychoacoustic models are used as input features for the Bag-of representation. Rather than using a prior classification or segmentation step to eliminate silence and background noise, Bag-of-Features representations are learned for a background class. Supervised learning of codebooks and temporal coding are shown to improve the recognition rates. Three different databases are used for the experiments: the CLEAR sound event dataset, the D-CASE event dataset and a new set of smart room recordings.
Pattern Recognition | 2014
Jan Richarz; Szilárd Vajda; Rene Grzeszick; Gernot A. Fink
Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment. HighlightsWe present semi-supervised labeling strategies that are able to considerably reduce the human effort.Two different methods to label and later recognize characters in collections of historical archive documents are proposed.A realistic application dealing with handwritten historical weather reports is introduced.Both methods are evaluated on the MNIST database of handwritten digits and the historical weather reports.
international conference on image processing | 2013
Rene Grzeszick; Leonard Rothacker; Gernot A. Fink
This paper presents a novel method for combining local image features and spatial information for object classification tasks using the Bag-of-Features principle. The feature descriptor is extended by additional spatial information. Hence, similar feature descriptors do not only describe similar image patches, but similar patches in roughly the same region. Different spatial measures are evaluated on the Caltech 101 dataset showing the improvement by incorporating spatial information into the feature descriptor. Furthermore, the method achieves better classification rates than the comparable Spatial Pyramids with lower a dimensional representation.
german conference on pattern recognition | 2015
Rene Grzeszick; Axel Plinge; Gernot A. Fink
The Bag-of-Features principle proved successful in many pattern recognition tasks ranging from document analysis and image classification to gesture recognition and even forensic applications. Lately these methods emerged in the field of acoustic event detection and showed very promising results. The detection and classification of acoustic events is an important task for many practical applications like video understanding, surveillance or speech enhancement. In this paper a novel approach for online acoustic event detection is presented that builds on top of the Bag-of-Features principle. Features are calculated for all frames in a given window. Applying the concept of feature augmentation additional temporal information is encoded in each feature vector. These feature vectors are then softly quantized so that a Bag-of-Feature representation is computed. These representations are evaluated by a classifier in a sliding window approach. The experiments on a challenging indoor dataset of acoustic events will show that the proposed method yields state-of-the-art results compared to other online event detection methods. Furthermore, it will be shown that the temporal feature augmentation significantly improves the recognition rates.
IEEE Transactions on Audio, Speech, and Language Processing | 2017
Rene Grzeszick; Axel Plinge; Gernot A. Fink
The detection and classification of acoustic events in various environments is an important task. Its applications range from multimedia analysis to surveillance of humans or even animal life. Several of these tasks require the capability of online processing. Besides many approaches that tackle the task of acoustic event detection, methods that are based on the well known bag-of-features principle also emerged into the field. Acoustic features are calculated for all frames in a given time window. Then, applying the bag-of-features concept, these features are quantized with respect to a learned codebook and a histogram representation is computed. Bag-of-features approaches are particularly interesting for online processing as they have a low computational cost. In this paper, the bag-of-features principle and various extensions are reviewed, including soft quantization, supervised codebook learning, and temporal modeling. Furthermore, Mel and Gammatone frequency cepstral coefficients that originate from psychoacoustic models are used as the underlying feature set for the bag-of-features. The possibility of fusing the results of multiple channels in order to improve the robustness is shown. Two databases are used for the experiments: The DCASE 2013 office live dataset and the ITC-IRST multichannel dataset.
International Journal of Pattern Recognition and Artificial Intelligence | 2016
Rene Grzeszick; Gernot A. Fink
Labeling images is tedious and a costly work that is required for many applications, for example, tagging, grouping and exploring of image collections. It is also necessary for training visual classifiers that recognize scenes or objects. It is therefore desirable to either reduce the human effort or infer additional knowledge by addressing this task with algorithms that allow for learning image annotations in a semi-supervised manner. In this paper, a semi-supervised annotation learning algorithm is introduced that is based on partitioning the data in a multi-view approach. The method is applied to large, diverse image collections of natural scene images. Experiments are performed on the 15 Scenes and SUN databases. It is shown that for sparsely labeled datasets the proposed annotation learning algorithm is able to infer additional knowledge from the unlabeled samples and therefore improve the performance of visual classifiers in comparison to supervised learning. Furthermore, the proposed algorithm outperforms other related semi-supervised learning approaches.
international conference on computer vision theory and applications | 2015
Johann Strassburg; Rene Grzeszick; Leonard Rothacker; Gernot A. Fink
Image parsing describes a very fine grained analysis of natural scene images, where each pixel is assigned a label describing the object or part of the scene it belongs to. This analysis is a keystone to a wide range of applications that could benefit from detailed scene understanding, such as keyword based image search, sentence based image or video descriptions and even autonomous cars or robots. State-of-the art approaches in image parsing are data-driven and allow for recognizing arbitrary categories based on a knowledge transfer from similar images. As transferring labels on pixel level is tedious and noisy, more recent approaches build on the idea of segmenting a scene and transferring the information based on regions. For creating these regions the most popular approaches rely on over-segmenting the scene into superpixels. In this paper the influence of different superpixel methods will be evaluated within the well known Superparsing framework. Furthermore, a new method that computes a superpixel-like over-segmentation of an image is presented that computes regions based on edge-avoiding wavelets. The evaluation on the SIFT Flow and Barcelona dataset will show that the choice of the superpixel method is crucial for the performance of image parsing.
international conference on frontiers in handwriting recognition | 2014
Gernot A. Fink; Leonard Rothacker; Rene Grzeszick
Handwritten historical documents pose extremely challenging problems for automatic analysis. This is due to the high variability observed in handwritten script, the use of writing styles and script types unknown today, the frequently lacking orthographic standardization, and the degradation of the respective documents. Therefore, it is currently out of question to develop general purpose handwriting recognition systems for historical document collections. It is, however, possible to search relatively homogeneous document collections using word spotting techniques. In this paper we consider the analysis of a challenging collection of postcards from the period of World War I delivered by the German military postal service. More specifically, we consider the automatic grouping of mail pieces by spotting potentially identical addressees. As the annotation of such documents is extremely challenging even for trained experts, a manually developed ground truth annotation will, in general, not be available. Furthermore, a reliable segmentation on word level will hardly be possible. With our segmentation-free query-by-example word spotting method we investigate modifications addressing the better generalization to a multi-writer scenario and its application to degraded documents. Promising results could be achieved in this highly challenging scenario.
Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction | 2017
Rene Grzeszick; Jan Marius Lenk; Fernando Moya Rueda; Gernot A. Fink; Sascha Feldhorst; Michael ten Hompel
Although the fourth industrial revolution is already in pro-gress and advances have been made in automating factories, completely automated facilities are still far in the future. Human work is still an important factor in many factories and warehouses, especially in the field of logistics. Manual processes are, therefore, often subject to optimization efforts. In order to aid these optimization efforts, methods like human activity recognition (HAR) became of increasing interest in industrial settings. In this work a novel deep neural network architecture for HAR is introduced. A convolutional neural network (CNN), which employs temporal convolutions, is applied to the sequential data of multiple intertial measurement units (IMUs). The network is designed to separately handle different sensor values and IMUs, joining the information step-by-step within the architecture. An evaluation is performed using data from the order picking process recorded in two different warehouses. The influence of different design choices in the network architecture, as well as pre- and post-processing, will be evaluated. Crucial steps for learning a good classification network for the task of HAR in a complex industrial setting will be shown. Ultimately, it can be shown that traditional approaches based on statistical features as well as recent CNN architectures are outperformed.
Informatics | 2018
Fernando Moya Rueda; Rene Grzeszick; Gernot A. Fink; Sascha Feldhorst; Michael ten Hompel
Human activity recognition (HAR) is a classification task for recognizing human movements. Methods of HAR are of great interest as they have become tools for measuring occurrences and durations of human actions, which are the basis of smart assistive technologies and manual processes analysis. Recently, deep neural networks have been deployed for HAR in the context of activities of daily living using multichannel time-series. These time-series are acquired from body-worn devices, which are composed of different types of sensors. The deep architectures process these measurements for finding basic and complex features in human corporal movements, and for classifying them into a set of human actions. As the devices are worn at different parts of the human body, we propose a novel deep neural network for HAR. This network handles sequence measurements from different body-worn devices separately. An evaluation of the architecture is performed on three datasets, the Oportunity, Pamap2, and an industrial dataset, outperforming the state-of-the-art. In addition, different network configurations will also be evaluated. We find that applying convolutions per sensor channel and per body-worn device improves the capabilities of convolutional neural network (CNNs).